WorldWideScience

Sample records for gene model prediction

  1. A deep auto-encoder model for gene expression prediction.

    Science.gov (United States)

    Xie, Rui; Wen, Jia; Quitadamo, Andrew; Cheng, Jianlin; Shi, Xinghua

    2017-11-17

    Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes' contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.

  2. Embryo quality predictive models based on cumulus cells gene expression

    Directory of Open Access Journals (Sweden)

    Devjak R

    2016-06-01

    Full Text Available Since the introduction of in vitro fertilization (IVF in clinical practice of infertility treatment, the indicators for high quality embryos were investigated. Cumulus cells (CC have a specific gene expression profile according to the developmental potential of the oocyte they are surrounding, and therefore, specific gene expression could be used as a biomarker. The aim of our study was to combine more than one biomarker to observe improvement in prediction value of embryo development. In this study, 58 CC samples from 17 IVF patients were analyzed. This study was approved by the Republic of Slovenia National Medical Ethics Committee. Gene expression analysis [quantitative real time polymerase chain reaction (qPCR] for five genes, analyzed according to embryo quality level, was performed. Two prediction models were tested for embryo quality prediction: a binary logistic and a decision tree model. As the main outcome, gene expression levels for five genes were taken and the area under the curve (AUC for two prediction models were calculated. Among tested genes, AMHR2 and LIF showed significant expression difference between high quality and low quality embryos. These two genes were used for the construction of two prediction models: the binary logistic model yielded an AUC of 0.72 ± 0.08 and the decision tree model yielded an AUC of 0.73 ± 0.03. Two different prediction models yielded similar predictive power to differentiate high and low quality embryos. In terms of eventual clinical decision making, the decision tree model resulted in easy-to-interpret rules that are highly applicable in clinical practice.

  3. Reranking candidate gene models with cross-species comparison for improved gene prediction

    Directory of Open Access Journals (Sweden)

    Pereira Fernando CN

    2008-10-01

    Full Text Available Abstract Background Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc. Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models.

  4. Predictive modelling of gene expression from transcriptional regulatory elements.

    Science.gov (United States)

    Budden, David M; Hurley, Daniel G; Crampin, Edmund J

    2015-07-01

    Predictive modelling of gene expression provides a powerful framework for exploring the regulatory logic underpinning transcriptional regulation. Recent studies have demonstrated the utility of such models in identifying dysregulation of gene and miRNA expression associated with abnormal patterns of transcription factor (TF) binding or nucleosomal histone modifications (HMs). Despite the growing popularity of such approaches, a comparative review of the various modelling algorithms and feature extraction methods is lacking. We define and compare three methods of quantifying pairwise gene-TF/HM interactions and discuss their suitability for integrating the heterogeneous chromatin immunoprecipitation (ChIP)-seq binding patterns exhibited by TFs and HMs. We then construct log-linear and ϵ-support vector regression models from various mouse embryonic stem cell (mESC) and human lymphoblastoid (GM12878) data sets, considering both ChIP-seq- and position weight matrix- (PWM)-derived in silico TF-binding. The two algorithms are evaluated both in terms of their modelling prediction accuracy and ability to identify the established regulatory roles of individual TFs and HMs. Our results demonstrate that TF-binding and HMs are highly predictive of gene expression as measured by mRNA transcript abundance, irrespective of algorithm or cell type selection and considering both ChIP-seq and PWM-derived TF-binding. As we encourage other researchers to explore and develop these results, our framework is implemented using open-source software and made available as a preconfigured bootable virtual environment. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  5. Dinucleotide controlled null models for comparative RNA gene prediction

    Directory of Open Access Journals (Sweden)

    Gesell Tanja

    2008-05-01

    Full Text Available Abstract Background Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. Results We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. Conclusion SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require

  6. Dinucleotide controlled null models for comparative RNA gene prediction.

    Science.gov (United States)

    Gesell, Tanja; Washietl, Stefan

    2008-05-27

    Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. SISSIz

  7. Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models.

    Science.gov (United States)

    Mahony, Shaun; McInerney, James O; Smith, Terry J; Golden, Aaron

    2004-03-05

    Many current gene prediction methods use only one model to represent protein-coding regions in a genome, and so are less likely to predict the location of genes that have an atypical sequence composition. It is likely that future improvements in gene finding will involve the development of methods that can adequately deal with intra-genomic compositional variation. This work explores a new approach to gene-prediction, based on the Self-Organizing Map, which has the ability to automatically identify multiple gene models within a genome. The current implementation, named RescueNet, uses relative synonymous codon usage as the indicator of protein-coding potential. While its raw accuracy rate can be less than other methods, RescueNet consistently identifies some genes that other methods do not, and should therefore be of interest to gene-prediction software developers and genome annotation teams alike. RescueNet is recommended for use in conjunction with, or as a complement to, other gene prediction methods.

  8. Identifying the Gene Signatures from Gene-Pathway Bipartite Network Guarantees the Robust Model Performance on Predicting the Cancer Prognosis

    Directory of Open Access Journals (Sweden)

    Li He

    2014-01-01

    Full Text Available For the purpose of improving the prediction of cancer prognosis in the clinical researches, various algorithms have been developed to construct the predictive models with the gene signatures detected by DNA microarrays. Due to the heterogeneity of the clinical samples, the list of differentially expressed genes (DEGs generated by the statistical methods or the machine learning algorithms often involves a number of false positive genes, which are not associated with the phenotypic differences between the compared clinical conditions, and subsequently impacts the reliability of the predictive models. In this study, we proposed a strategy, which combined the statistical algorithm with the gene-pathway bipartite networks, to generate the reliable lists of cancer-related DEGs and constructed the models by using support vector machine for predicting the prognosis of three types of cancers, namely, breast cancer, acute myeloma leukemia, and glioblastoma. Our results demonstrated that, combined with the gene-pathway bipartite networks, our proposed strategy can efficiently generate the reliable cancer-related DEG lists for constructing the predictive models. In addition, the model performance in the swap analysis was similar to that in the original analysis, indicating the robustness of the models in predicting the cancer outcomes.

  9. Probability-based collaborative filtering model for predicting gene-disease associations.

    Science.gov (United States)

    Zeng, Xiangxiang; Ding, Ningxiang; Rodríguez-Patón, Alfonso; Zou, Quan

    2017-12-28

    Accurately predicting pathogenic human genes has been challenging in recent research. Considering extensive gene-disease data verified by biological experiments, we can apply computational methods to perform accurate predictions with reduced time and expenses. We propose a probability-based collaborative filtering model (PCFM) to predict pathogenic human genes. Several kinds of data sets, containing data of humans and data of other nonhuman species, are integrated in our model. Firstly, on the basis of a typical latent factorization model, we propose model I with an average heterogeneous regularization. Secondly, we develop modified model II with personal heterogeneous regularization to enhance the accuracy of aforementioned models. In this model, vector space similarity or Pearson correlation coefficient metrics and data on related species are also used. We compared the results of PCFM with the results of four state-of-arts approaches. The results show that PCFM performs better than other advanced approaches. PCFM model can be leveraged for predictions of disease genes, especially for new human genes or diseases with no known relationships.

  10. Intra- and interspecies gene expression models for predicting drug response in canine osteosarcoma.

    Science.gov (United States)

    Fowles, Jared S; Brown, Kristen C; Hess, Ann M; Duval, Dawn L; Gustafson, Daniel L

    2016-02-19

    Genomics-based predictors of drug response have the potential to improve outcomes associated with cancer therapy. Osteosarcoma (OS), the most common primary bone cancer in dogs, is commonly treated with adjuvant doxorubicin or carboplatin following amputation of the affected limb. We evaluated the use of gene-expression based models built in an intra- or interspecies manner to predict chemosensitivity and treatment outcome in canine OS. Models were built and evaluated using microarray gene expression and drug sensitivity data from human and canine cancer cell lines, and canine OS tumor datasets. The "COXEN" method was utilized to filter gene signatures between human and dog datasets based on strong co-expression patterns. Models were built using linear discriminant analysis via the misclassification penalized posterior algorithm. The best doxorubicin model involved genes identified in human lines that were co-expressed and trained on canine OS tumor data, which accurately predicted clinical outcome in 73 % of dogs (p = 0.0262, binomial). The best carboplatin model utilized canine lines for gene identification and model training, with canine OS tumor data for co-expression. Dogs whose treatment matched our predictions had significantly better clinical outcomes than those that didn't (p = 0.0006, Log Rank), and this predictor significantly associated with longer disease free intervals in a Cox multivariate analysis (hazard ratio = 0.3102, p = 0.0124). Our data show that intra- and interspecies gene expression models can successfully predict response in canine OS, which may improve outcome in dogs and serve as pre-clinical validation for similar methods in human cancer research.

  11. Predictive models for mutations in mismatch repair genes: implication for genetic counseling in developing countries

    Directory of Open Access Journals (Sweden)

    Monteiro Santos Erika

    2012-02-01

    Full Text Available Abstract Background Lynch syndrome (LS is the most common form of inherited predisposition to colorectal cancer (CRC, accounting for 2-5% of all CRC. LS is an autosomal dominant disease characterized by mutations in the mismatch repair genes mutL homolog 1 (MLH1, mutS homolog 2 (MSH2, postmeiotic segregation increased 1 (PMS1, post-meiotic segregation increased 2 (PMS2 and mutS homolog 6 (MSH6. Mutation risk prediction models can be incorporated into clinical practice, facilitating the decision-making process and identifying individuals for molecular investigation. This is extremely important in countries with limited economic resources. This study aims to evaluate sensitivity and specificity of five predictive models for germline mutations in repair genes in a sample of individuals with suspected Lynch syndrome. Methods Blood samples from 88 patients were analyzed through sequencing MLH1, MSH2 and MSH6 genes. The probability of detecting a mutation was calculated using the PREMM, Barnetson, MMRpro, Wijnen and Myriad models. To evaluate the sensitivity and specificity of the models, receiver operating characteristic curves were constructed. Results Of the 88 patients included in this analysis, 31 mutations were identified: 16 were found in the MSH2 gene, 15 in the MLH1 gene and no pathogenic mutations were identified in the MSH6 gene. It was observed that the AUC for the PREMM (0.846, Barnetson (0.850, MMRpro (0.821 and Wijnen (0.807 models did not present significant statistical difference. The Myriad model presented lower AUC (0.704 than the four other models evaluated. Considering thresholds of ≥ 5%, the models sensitivity varied between 1 (Myriad and 0.87 (Wijnen and specificity ranged from 0 (Myriad to 0.38 (Barnetson. Conclusions The Barnetson, PREMM, MMRpro and Wijnen models present similar AUC. The AUC of the Myriad model is statistically inferior to the four other models.

  12. Predictive models for mutations in mismatch repair genes: implication for genetic counseling in developing countries

    Energy Technology Data Exchange (ETDEWEB)

    Monteiro Santos, Erika Maria [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); International Center of Research and Training (CIPE), AC Camargo Hospital, Sao Paulo (Brazil); Silva Junior, Wilson Araujo da [Sao Paulo University, Department of Genetics, Medical School of Ribeirao Preto, Ribeirao Preto (Brazil); Carraro, Dirce Maria [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); International Center of Research and Training (CIPE), AC Camargo Hospital, Sao Paulo (Brazil); Rossi, Benedito Mauro; Valentin, Mev Dominguez [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); Carneiro, Felipe [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); International Center of Research and Training (CIPE), AC Camargo Hospital, Sao Paulo (Brazil); Oliveira, Ligia Petrolini de [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); Oliveira Ferreira, Fabio de; Junior, Samuel Aguiar [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); Hereditary Colorectal Cancer Registry, AC Camargo Hospital, Sao Paulo (Brazil); Nakagawa, Wilson Toshihiko [Hereditary Colorectal Cancer Registry, AC Camargo Hospital, Sao Paulo (Brazil); Gomy, Israel [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); Sao Paulo University, Department of Genetics, Medical School of Ribeirao Preto, Ribeirao Preto (Brazil); Faria Ferraz, Victor Evangelista de [Sao Paulo University, Department of Genetics, Medical School of Ribeirao Preto, Ribeirao Preto (Brazil)

    2012-02-09

    Lynch syndrome (LS) is the most common form of inherited predisposition to colorectal cancer (CRC), accounting for 2-5% of all CRC. LS is an autosomal dominant disease characterized by mutations in the mismatch repair genes mutL homolog 1 (MLH1), mutS homolog 2 (MSH2), postmeiotic segregation increased 1 (PMS1), post-meiotic segregation increased 2 (PMS2) and mutS homolog 6 (MSH6). Mutation risk prediction models can be incorporated into clinical practice, facilitating the decision-making process and identifying individuals for molecular investigation. This is extremely important in countries with limited economic resources. This study aims to evaluate sensitivity and specificity of five predictive models for germline mutations in repair genes in a sample of individuals with suspected Lynch syndrome. Blood samples from 88 patients were analyzed through sequencing MLH1, MSH2 and MSH6 genes. The probability of detecting a mutation was calculated using the PREMM, Barnetson, MMRpro, Wijnen and Myriad models. To evaluate the sensitivity and specificity of the models, receiver operating characteristic curves were constructed. Of the 88 patients included in this analysis, 31 mutations were identified: 16 were found in the MSH2 gene, 15 in the MLH1 gene and no pathogenic mutations were identified in the MSH6 gene. It was observed that the AUC for the PREMM (0.846), Barnetson (0.850), MMRpro (0.821) and Wijnen (0.807) models did not present significant statistical difference. The Myriad model presented lower AUC (0.704) than the four other models evaluated. Considering thresholds of ≥ 5%, the models sensitivity varied between 1 (Myriad) and 0.87 (Wijnen) and specificity ranged from 0 (Myriad) to 0.38 (Barnetson). The Barnetson, PREMM, MMRpro and Wijnen models present similar AUC. The AUC of the Myriad model is statistically inferior to the four other models.

  13. Predictive models for mutations in mismatch repair genes: implication for genetic counseling in developing countries

    International Nuclear Information System (INIS)

    Monteiro Santos, Erika Maria; Silva Junior, Wilson Araujo da; Carraro, Dirce Maria; Rossi, Benedito Mauro; Valentin, Mev Dominguez; Carneiro, Felipe; Oliveira, Ligia Petrolini de; Oliveira Ferreira, Fabio de; Junior, Samuel Aguiar; Nakagawa, Wilson Toshihiko; Gomy, Israel; Faria Ferraz, Victor Evangelista de

    2012-01-01

    Lynch syndrome (LS) is the most common form of inherited predisposition to colorectal cancer (CRC), accounting for 2-5% of all CRC. LS is an autosomal dominant disease characterized by mutations in the mismatch repair genes mutL homolog 1 (MLH1), mutS homolog 2 (MSH2), postmeiotic segregation increased 1 (PMS1), post-meiotic segregation increased 2 (PMS2) and mutS homolog 6 (MSH6). Mutation risk prediction models can be incorporated into clinical practice, facilitating the decision-making process and identifying individuals for molecular investigation. This is extremely important in countries with limited economic resources. This study aims to evaluate sensitivity and specificity of five predictive models for germline mutations in repair genes in a sample of individuals with suspected Lynch syndrome. Blood samples from 88 patients were analyzed through sequencing MLH1, MSH2 and MSH6 genes. The probability of detecting a mutation was calculated using the PREMM, Barnetson, MMRpro, Wijnen and Myriad models. To evaluate the sensitivity and specificity of the models, receiver operating characteristic curves were constructed. Of the 88 patients included in this analysis, 31 mutations were identified: 16 were found in the MSH2 gene, 15 in the MLH1 gene and no pathogenic mutations were identified in the MSH6 gene. It was observed that the AUC for the PREMM (0.846), Barnetson (0.850), MMRpro (0.821) and Wijnen (0.807) models did not present significant statistical difference. The Myriad model presented lower AUC (0.704) than the four other models evaluated. Considering thresholds of ≥ 5%, the models sensitivity varied between 1 (Myriad) and 0.87 (Wijnen) and specificity ranged from 0 (Myriad) to 0.38 (Barnetson). The Barnetson, PREMM, MMRpro and Wijnen models present similar AUC. The AUC of the Myriad model is statistically inferior to the four other models

  14. A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress

    Directory of Open Access Journals (Sweden)

    Ching-Hsue Cheng

    2018-01-01

    Full Text Available The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i the proposed model is different from the previous models lacking the concept of time series; (ii the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies.

  15. A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress

    Science.gov (United States)

    2018-01-01

    The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i) the proposed model is different from the previous models lacking the concept of time series; (ii) the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii) the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies. PMID:29765399

  16. A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress.

    Science.gov (United States)

    Cheng, Ching-Hsue; Chan, Chia-Pang; Yang, Jun-He

    2018-01-01

    The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i) the proposed model is different from the previous models lacking the concept of time series; (ii) the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii) the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies.

  17. The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection.

    Science.gov (United States)

    Tang, Zaixiang; Shen, Yueping; Zhang, Xinyan; Yi, Nengjun

    2017-01-01

    Large-scale "omics" data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, there are considerable challenges in analyzing high-dimensional molecular data, including the large number of potential molecular predictors, limited number of samples, and small effect of each predictor. We propose new Bayesian hierarchical generalized linear models, called spike-and-slab lasso GLMs, for prognostic prediction and detection of associated genes using large-scale molecular data. The proposed model employs a spike-and-slab mixture double-exponential prior for coefficients that can induce weak shrinkage on large coefficients, and strong shrinkage on irrelevant coefficients. We have developed a fast and stable algorithm to fit large-scale hierarchal GLMs by incorporating expectation-maximization (EM) steps into the fast cyclic coordinate descent algorithm. The proposed approach integrates nice features of two popular methods, i.e., penalized lasso and Bayesian spike-and-slab variable selection. The performance of the proposed method is assessed via extensive simulation studies. The results show that the proposed approach can provide not only more accurate estimates of the parameters, but also better prediction. We demonstrate the proposed procedure on two cancer data sets: a well-known breast cancer data set consisting of 295 tumors, and expression data of 4919 genes; and the ovarian cancer data set from TCGA with 362 tumors, and expression data of 5336 genes. Our analyses show that the proposed procedure can generate powerful models for predicting outcomes and detecting associated genes. The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). Copyright © 2017 by the Genetics Society of America.

  18. Boolean Dynamic Modeling Approaches to Study Plant Gene Regulatory Networks: Integration, Validation, and Prediction.

    Science.gov (United States)

    Velderraín, José Dávila; Martínez-García, Juan Carlos; Álvarez-Buylla, Elena R

    2017-01-01

    Mathematical models based on dynamical systems theory are well-suited tools for the integration of available molecular experimental data into coherent frameworks in order to propose hypotheses about the cooperative regulatory mechanisms driving developmental processes. Computational analysis of the proposed models using well-established methods enables testing the hypotheses by contrasting predictions with observations. Within such framework, Boolean gene regulatory network dynamical models have been extensively used in modeling plant development. Boolean models are simple and intuitively appealing, ideal tools for collaborative efforts between theorists and experimentalists. In this chapter we present protocols used in our group for the study of diverse plant developmental processes. We focus on conceptual clarity and practical implementation, providing directions to the corresponding technical literature.

  19. The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction

    Directory of Open Access Journals (Sweden)

    Garzón-Martínez Gina A

    2012-04-01

    Full Text Available Abstract Background Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. Results We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs, using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI’s BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato and Solanum tuberosum (potato. We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. Conclusions We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the

  20. The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction.

    Science.gov (United States)

    Garzón-Martínez, Gina A; Zhu, Z Iris; Landsman, David; Barrero, Luz S; Mariño-Ramírez, Leonardo

    2012-04-25

    Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs), using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI's BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato) and Solanum tuberosum (potato). We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the divergence of five other Solanaceae family members, S

  1. Predicting spatial and temporal gene expression using an integrative model of transcription factor occupancy and chromatin state.

    Directory of Open Access Journals (Sweden)

    Bartek Wilczynski

    Full Text Available Precise patterns of spatial and temporal gene expression are central to metazoan complexity and act as a driving force for embryonic development. While there has been substantial progress in dissecting and predicting cis-regulatory activity, our understanding of how information from multiple enhancer elements converge to regulate a gene's expression remains elusive. This is in large part due to the number of different biological processes involved in mediating regulation as well as limited availability of experimental measurements for many of them. Here, we used a Bayesian approach to model diverse experimental regulatory data, leading to accurate predictions of both spatial and temporal aspects of gene expression. We integrated whole-embryo information on transcription factor recruitment to multiple cis-regulatory modules, insulator binding and histone modification status in the vicinity of individual gene loci, at a genome-wide scale during Drosophila development. The model uses Bayesian networks to represent the relation between transcription factor occupancy and enhancer activity in specific tissues and stages. All parameters are optimized in an Expectation Maximization procedure providing a model capable of predicting tissue- and stage-specific activity of new, previously unassayed genes. Performing the optimization with subsets of input data demonstrated that neither enhancer occupancy nor chromatin state alone can explain all gene expression patterns, but taken together allow for accurate predictions of spatio-temporal activity. Model predictions were validated using the expression patterns of more than 600 genes recently made available by the BDGP consortium, demonstrating an average 15-fold enrichment of genes expressed in the predicted tissue over a naïve model. We further validated the model by experimentally testing the expression of 20 predicted target genes of unknown expression, resulting in an accuracy of 95% for temporal

  2. Group spike-and-slab lasso generalized linear models for disease prediction and associated genes detection by incorporating pathway information.

    Science.gov (United States)

    Tang, Zaixiang; Shen, Yueping; Li, Yan; Zhang, Xinyan; Wen, Jia; Qian, Chen'ao; Zhuang, Wenzhuo; Shi, Xinghua; Yi, Nengjun

    2018-03-15

    Large-scale molecular data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, standard approaches for omics data analysis ignore the group structure among genes encoded in functional relationships or pathway information. We propose new Bayesian hierarchical generalized linear models, called group spike-and-slab lasso GLMs, for predicting disease outcomes and detecting associated genes by incorporating large-scale molecular data and group structures. The proposed model employs a mixture double-exponential prior for coefficients that induces self-adaptive shrinkage amount on different coefficients. The group information is incorporated into the model by setting group-specific parameters. We have developed a fast and stable deterministic algorithm to fit the proposed hierarchal GLMs, which can perform variable selection within groups. We assess the performance of the proposed method on several simulated scenarios, by varying the overlap among groups, group size, number of non-null groups, and the correlation within group. Compared with existing methods, the proposed method provides not only more accurate estimates of the parameters but also better prediction. We further demonstrate the application of the proposed procedure on three cancer datasets by utilizing pathway structures of genes. Our results show that the proposed method generates powerful models for predicting disease outcomes and detecting associated genes. The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). nyi@uab.edu. Supplementary data are available at Bioinformatics online.

  3. Multi-gene genetic programming based predictive models for municipal solid waste gasification in a fluidized bed gasifier.

    Science.gov (United States)

    Pandey, Daya Shankar; Pan, Indranil; Das, Saptarshi; Leahy, James J; Kwapinski, Witold

    2015-03-01

    A multi-gene genetic programming technique is proposed as a new method to predict syngas yield production and the lower heating value for municipal solid waste gasification in a fluidized bed gasifier. The study shows that the predicted outputs of the municipal solid waste gasification process are in good agreement with the experimental dataset and also generalise well to validation (untrained) data. Published experimental datasets are used for model training and validation purposes. The results show the effectiveness of the genetic programming technique for solving complex nonlinear regression problems. The multi-gene genetic programming are also compared with a single-gene genetic programming model to show the relative merits and demerits of the technique. This study demonstrates that the genetic programming based data-driven modelling strategy can be a good candidate for developing models for other types of fuels as well. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing.

    Science.gov (United States)

    Zhao, Yingwen; Fu, Guangyuan; Wang, Jun; Guo, Maozu; Yu, Guoxian

    2018-02-23

    Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. An Individual-Based Diploid Model Predicts Limited Conditions Under Which Stochastic Gene Expression Becomes Advantageous

    KAUST Repository

    Matsumoto, Tomotaka

    2015-11-24

    Recent studies suggest the existence of a stochasticity in gene expression (SGE) in many organisms, and its non-negligible effect on their phenotype and fitness. To date, however, how SGE affects the key parameters of population genetics are not well understood. SGE can increase the phenotypic variation and act as a load for individuals, if they are at the adaptive optimum in a stable environment. On the other hand, part of the phenotypic variation caused by SGE might become advantageous if individuals at the adaptive optimum become genetically less-adaptive, for example due to an environmental change. Furthermore, SGE of unimportant genes might have little or no fitness consequences. Thus, SGE can be advantageous, disadvantageous, or selectively neutral depending on its context. In addition, there might be a genetic basis that regulates magnitude of SGE, which is often referred to as “modifier genes,” but little is known about the conditions under which such an SGE-modifier gene evolves. In the present study, we conducted individual-based computer simulations to examine these conditions in a diploid model. In the simulations, we considered a single locus that determines organismal fitness for simplicity, and that SGE on the locus creates fitness variation in a stochastic manner. We also considered another locus that modifies the magnitude of SGE. Our results suggested that SGE was always deleterious in stable environments and increased the fixation probability of deleterious mutations in this model. Even under frequently changing environmental conditions, only very strong natural selection made SGE adaptive. These results suggest that the evolution of SGE-modifier genes requires strict balance among the strength of natural selection, magnitude of SGE, and frequency of environmental changes. However, the degree of dominance affected the condition under which SGE becomes advantageous, indicating a better opportunity for the evolution of SGE in different genetic

  6. A biological network-based regularized artificial neural network model for robust phenotype prediction from gene expression data.

    Science.gov (United States)

    Kang, Tianyu; Ding, Wei; Zhang, Luoyan; Ziemek, Daniel; Zarringhalam, Kourosh

    2017-12-19

    Stratification of patient subpopulations that respond favorably to treatment or experience and adverse reaction is an essential step toward development of new personalized therapies and diagnostics. It is currently feasible to generate omic-scale biological measurements for all patients in a study, providing an opportunity for machine learning models to identify molecular markers for disease diagnosis and progression. However, the high variability of genetic background in human populations hampers the reproducibility of omic-scale markers. In this paper, we develop a biological network-based regularized artificial neural network model for prediction of phenotype from transcriptomic measurements in clinical trials. To improve model sparsity and the overall reproducibility of the model, we incorporate regularization for simultaneous shrinkage of gene sets based on active upstream regulatory mechanisms into the model. We benchmark our method against various regression, support vector machines and artificial neural network models and demonstrate the ability of our method in predicting the clinical outcomes using clinical trial data on acute rejection in kidney transplantation and response to Infliximab in ulcerative colitis. We show that integration of prior biological knowledge into the classification as developed in this paper, significantly improves the robustness and generalizability of predictions to independent datasets. We provide a Java code of our algorithm along with a parsed version of the STRING DB database. In summary, we present a method for prediction of clinical phenotypes using baseline genome-wide expression data that makes use of prior biological knowledge on gene-regulatory interactions in order to increase robustness and reproducibility of omic-scale markers. The integrated group-wise regularization methods increases the interpretability of biological signatures and gives stable performance estimates across independent test sets.

  7. An Individual-Based Diploid Model Predicts Limited Conditions Under Which Stochastic Gene Expression Becomes Advantageous

    KAUST Repository

    Matsumoto, Tomotaka; Mineta, Katsuhiko; Osada, Naoki; Araki, Hitoshi

    2015-01-01

    Recent studies suggest the existence of a stochasticity in gene expression (SGE) in many organisms, and its non-negligible effect on their phenotype and fitness. To date, however, how SGE affects the key parameters of population genetics

  8. Bayesian mixture models for assessment of gene differential behaviour and prediction of pCR through the integration of copy number and gene expression data.

    Directory of Open Access Journals (Sweden)

    Filippo Trentini

    Full Text Available We consider modeling jointly microarray RNA expression and DNA copy number data. We propose Bayesian mixture models that define latent Gaussian probit scores for the DNA and RNA, and integrate between the two platforms via a regression of the RNA probit scores on the DNA probit scores. Such a regression conveniently allows us to include additional sample specific covariates such as biological conditions and clinical outcomes. The two developed methods are aimed respectively to make inference on differential behaviour of genes in patients showing different subtypes of breast cancer and to predict the pathological complete response (pCR of patients borrowing strength across the genomic platforms. Posterior inference is carried out via MCMC simulations. We demonstrate the proposed methodology using a published data set consisting of 121 breast cancer patients.

  9. Clinicopathologic and gene expression parameters predict liver cancer prognosis

    International Nuclear Information System (INIS)

    Hao, Ke; Zhong, Hua; Greenawalt, Danielle; Ferguson, Mark D; Ng, Irene O; Sham, Pak C; Poon, Ronnie T; Molony, Cliona; Schadt, Eric E; Dai, Hongyue; Luk, John M; Lamb, John; Zhang, Chunsheng; Xie, Tao; Wang, Kai; Zhang, Bin; Chudin, Eugene; Lee, Nikki P; Mao, Mao

    2011-01-01

    The prognosis of hepatocellular carcinoma (HCC) varies following surgical resection and the large variation remains largely unexplained. Studies have revealed the ability of clinicopathologic parameters and gene expression to predict HCC prognosis. However, there has been little systematic effort to compare the performance of these two types of predictors or combine them in a comprehensive model. Tumor and adjacent non-tumor liver tissues were collected from 272 ethnic Chinese HCC patients who received curative surgery. We combined clinicopathologic parameters and gene expression data (from both tissue types) in predicting HCC prognosis. Cross-validation and independent studies were employed to assess prediction. HCC prognosis was significantly associated with six clinicopathologic parameters, which can partition the patients into good- and poor-prognosis groups. Within each group, gene expression data further divide patients into distinct prognostic subgroups. Our predictive genes significantly overlap with previously published gene sets predictive of prognosis. Moreover, the predictive genes were enriched for genes that underwent normal-to-tumor gene network transformation. Previously documented liver eSNPs underlying the HCC predictive gene signatures were enriched for SNPs that associated with HCC prognosis, providing support that these genes are involved in key processes of tumorigenesis. When applied individually, clinicopathologic parameters and gene expression offered similar predictive power for HCC prognosis. In contrast, a combination of the two types of data dramatically improved the power to predict HCC prognosis. Our results also provided a framework for understanding the impact of gene expression on the processes of tumorigenesis and clinical outcome

  10. Vertebrate gene predictions and the problem of large genes

    DEFF Research Database (Denmark)

    Wang, Jun; Li, ShengTing; Zhang, Yong

    2003-01-01

    To find unknown protein-coding genes, annotation pipelines use a combination of ab initio gene prediction and similarity to experimentally confirmed genes or proteins. Here, we show that although the ab initio predictions have an intrinsically high false-positive rate, they also have a consistent...

  11. Exploring the Optimal Strategy to Predict Essential Genes in Microbes

    Directory of Open Access Journals (Sweden)

    Yao Lu

    2011-12-01

    Full Text Available Accurately predicting essential genes is important in many aspects of biology, medicine and bioengineering. In previous research, we have developed a machine learning based integrative algorithm to predict essential genes in bacterial species. This algorithm lends itself to two approaches for predicting essential genes: learning the traits from known essential genes in the target organism, or transferring essential gene annotations from a closely related model organism. However, for an understudied microbe, each approach has its potential limitations. The first is constricted by the often small number of known essential genes. The second is limited by the availability of model organisms and by evolutionary distance. In this study, we aim to determine the optimal strategy for predicting essential genes by examining four microbes with well-characterized essential genes. Our results suggest that, unless the known essential genes are few, learning from the known essential genes in the target organism usually outperforms transferring essential gene annotations from a related model organism. In fact, the required number of known essential genes is surprisingly small to make accurate predictions. In prokaryotes, when the number of known essential genes is greater than 2% of total genes, this approach already comes close to its optimal performance. In eukaryotes, achieving the same best performance requires over 4% of total genes, reflecting the increased complexity of eukaryotic organisms. Combining the two approaches resulted in an increased performance when the known essential genes are few. Our investigation thus provides key information on accurately predicting essential genes and will greatly facilitate annotations of microbial genomes.

  12. Predicting Hydrologic Function With Aquatic Gene Fragments

    Science.gov (United States)

    Good, S. P.; URycki, D. R.; Crump, B. C.

    2018-03-01

    Recent advances in microbiology techniques, such as genetic sequencing, allow for rapid and cost-effective collection of large quantities of genetic information carried within water samples. Here we posit that the unique composition of aquatic DNA material within a water sample contains relevant information about hydrologic function at multiple temporal scales. In this study, machine learning was used to develop discharge prediction models trained on the relative abundance of bacterial taxa classified into operational taxonomic units (OTUs) based on 16S rRNA gene sequences from six large arctic rivers. We term this approach "genohydrology," and show that OTU relative abundances can be used to predict river discharge at monthly and longer timescales. Based on a single DNA sample from each river, the average Nash-Sutcliffe efficiency (NSE) for predicted mean monthly discharge values throughout the year was 0.84, while the NSE for predicted discharge values across different return intervals was 0.67. These are considerable improvements over predictions based only on the area-scaled mean specific discharge of five similar rivers, which had average NSE values of 0.64 and -0.32 for seasonal and recurrence interval discharge values, respectively. The genohydrology approach demonstrates that genetic diversity within the aquatic microbiome is a large and underutilized data resource with benefits for prediction of hydrologic function.

  13. Predictive modeling of complications.

    Science.gov (United States)

    Osorio, Joseph A; Scheer, Justin K; Ames, Christopher P

    2016-09-01

    Predictive analytic algorithms are designed to identify patterns in the data that allow for accurate predictions without the need for a hypothesis. Therefore, predictive modeling can provide detailed and patient-specific information that can be readily applied when discussing the risks of surgery with a patient. There are few studies using predictive modeling techniques in the adult spine surgery literature. These types of studies represent the beginning of the use of predictive analytics in spine surgery outcomes. We will discuss the advancements in the field of spine surgery with respect to predictive analytics, the controversies surrounding the technique, and the future directions.

  14. Predicting cellular growth from gene expression signatures.

    Directory of Open Access Journals (Sweden)

    Edoardo M Airoldi

    2009-01-01

    Full Text Available Maintaining balanced growth in a changing environment is a fundamental systems-level challenge for cellular physiology, particularly in microorganisms. While the complete set of regulatory and functional pathways supporting growth and cellular proliferation are not yet known, portions of them are well understood. In particular, cellular proliferation is governed by mechanisms that are highly conserved from unicellular to multicellular organisms, and the disruption of these processes in metazoans is a major factor in the development of cancer. In this paper, we develop statistical methodology to identify quantitative aspects of the regulatory mechanisms underlying cellular proliferation in Saccharomyces cerevisiae. We find that the expression levels of a small set of genes can be exploited to predict the instantaneous growth rate of any cellular culture with high accuracy. The predictions obtained in this fashion are robust to changing biological conditions, experimental methods, and technological platforms. The proposed model is also effective in predicting growth rates for the related yeast Saccharomyces bayanus and the highly diverged yeast Schizosaccharomyces pombe, suggesting that the underlying regulatory signature is conserved across a wide range of unicellular evolution. We investigate the biological significance of the gene expression signature that the predictions are based upon from multiple perspectives: by perturbing the regulatory network through the Ras/PKA pathway, observing strong upregulation of growth rate even in the absence of appropriate nutrients, and discovering putative transcription factor binding sites, observing enrichment in growth-correlated genes. More broadly, the proposed methodology enables biological insights about growth at an instantaneous time scale, inaccessible by direct experimental methods. Data and tools enabling others to apply our methods are available at http://function.princeton.edu/growthrate.

  15. Genomic Prediction of Gene Bank Wheat Landraces

    Directory of Open Access Journals (Sweden)

    José Crossa

    2016-07-01

    Full Text Available This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H for the highly heritable traits, days to heading (DTH, and days to maturity (DTM. Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E. Two alternative prediction strategies were studied: (1 random cross-validation of the data in 20% training (TRN and 80% testing (TST (TRN20-TST80 sets, and (2 two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm

  16. Archaeological predictive model set.

    Science.gov (United States)

    2015-03-01

    This report is the documentation for Task 7 of the Statewide Archaeological Predictive Model Set. The goal of this project is to : develop a set of statewide predictive models to assist the planning of transportation projects. PennDOT is developing t...

  17. Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.

    Science.gov (United States)

    Zhou, Hang; Yang, Yang; Shen, Hong-Bin

    2017-03-15

    Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models. In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  18. Gene prediction validation and functional analysis of redundant pathways

    DEFF Research Database (Denmark)

    Sønderkær, Mads

    2011-01-01

    have employed a large mRNA-seq data set to improve and validate ab initio predicted gene models. This direct experimental evidence also provides reliable determinations of UTR regions and polyadenylation sites, which are not easily predicted in plants. Furthermore, once an annotated genome sequence...... is available, gene expression by mRNA-Seq enables acquisition of a more complete overview of gene isoform usage in complex enzymatic pathways enabling the identification of key genes. Metabolism in potatoes This information is useful e.g. for crop improvement based on manipulation of agronomically important...

  19. Wind power prediction models

    Science.gov (United States)

    Levy, R.; Mcginness, H.

    1976-01-01

    Investigations were performed to predict the power available from the wind at the Goldstone, California, antenna site complex. The background for power prediction was derived from a statistical evaluation of available wind speed data records at this location and at nearby locations similarly situated within the Mojave desert. In addition to a model for power prediction over relatively long periods of time, an interim simulation model that produces sample wind speeds is described. The interim model furnishes uncorrelated sample speeds at hourly intervals that reproduce the statistical wind distribution at Goldstone. A stochastic simulation model to provide speed samples representative of both the statistical speed distributions and correlations is also discussed.

  20. Building and validating a prediction model for paediatric type 1 diabetes risk using next generation targeted sequencing of class II HLA genes.

    Science.gov (United States)

    Zhao, Lue Ping; Carlsson, Annelie; Larsson, Helena Elding; Forsander, Gun; Ivarsson, Sten A; Kockum, Ingrid; Ludvigsson, Johnny; Marcus, Claude; Persson, Martina; Samuelsson, Ulf; Örtqvist, Eva; Pyo, Chul-Woo; Bolouri, Hamid; Zhao, Michael; Nelson, Wyatt C; Geraghty, Daniel E; Lernmark, Åke

    2017-11-01

    It is of interest to predict possible lifetime risk of type 1 diabetes (T1D) in young children for recruiting high-risk subjects into longitudinal studies of effective prevention strategies. Utilizing a case-control study in Sweden, we applied a recently developed next generation targeted sequencing technology to genotype class II genes and applied an object-oriented regression to build and validate a prediction model for T1D. In the training set, estimated risk scores were significantly different between patients and controls (P = 8.12 × 10 -92 ), and the area under the curve (AUC) from the receiver operating characteristic (ROC) analysis was 0.917. Using the validation data set, we validated the result with AUC of 0.886. Combining both training and validation data resulted in a predictive model with AUC of 0.903. Further, we performed a "biological validation" by correlating risk scores with 6 islet autoantibodies, and found that the risk score was significantly correlated with IA-2A (Z-score = 3.628, P < 0.001). When applying this prediction model to the Swedish population, where the lifetime T1D risk ranges from 0.5% to 2%, we anticipate identifying approximately 20 000 high-risk subjects after testing all newborns, and this calculation would identify approximately 80% of all patients expected to develop T1D in their lifetime. Through both empirical and biological validation, we have established a prediction model for estimating lifetime T1D risk, using class II HLA. This prediction model should prove useful for future investigations to identify high-risk subjects for prevention research in high-risk populations. Copyright © 2017 John Wiley & Sons, Ltd.

  1. Neural Inductive Matrix Completion for Predicting Disease-Gene Associations

    KAUST Repository

    Hou, Siqing

    2018-05-21

    In silico prioritization of undiscovered associations can help find causal genes of newly discovered diseases. Some existing methods are based on known associations, and side information of diseases and genes. We exploit the possibility of using a neural network model, Neural inductive matrix completion (NIMC), in disease-gene prediction. Comparing to the state-of-the-art inductive matrix completion method, using neural networks allows us to learn latent features from non-linear functions of input features. Previous methods use disease features only from mining text. Comparing to text mining, disease ontology is a more informative way of discovering correlation of dis- eases, from which we can calculate the similarities between diseases and help increase the performance of predicting disease-gene associations. We compare the proposed method with other state-of-the-art methods for pre- dicting associated genes for diseases from the Online Mendelian Inheritance in Man (OMIM) database. Results show that both new features and the proposed NIMC model can improve the chance of recovering an unknown associated gene in the top 100 predicted genes. Best results are obtained by using both the new features and the new model. Results also show the proposed method does better in predicting associated genes for newly discovered diseases.

  2. Inverse and Predictive Modeling

    Energy Technology Data Exchange (ETDEWEB)

    Syracuse, Ellen Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-09-27

    The LANL Seismo-Acoustic team has a strong capability in developing data-driven models that accurately predict a variety of observations. These models range from the simple – one-dimensional models that are constrained by a single dataset and can be used for quick and efficient predictions – to the complex – multidimensional models that are constrained by several types of data and result in more accurate predictions. Team members typically build models of geophysical characteristics of Earth and source distributions at scales of 1 to 1000s of km, the techniques used are applicable for other types of physical characteristics at an even greater range of scales. The following cases provide a snapshot of some of the modeling work done by the Seismo- Acoustic team at LANL.

  3. Global discriminative learning for higher-accuracy computational gene prediction.

    Directory of Open Access Journals (Sweden)

    Axel Bernal

    2007-03-01

    Full Text Available Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns.

  4. Combining gene prediction methods to improve metagenomic gene annotation

    Directory of Open Access Journals (Sweden)

    Rosen Gail L

    2011-01-01

    Full Text Available Abstract Background Traditional gene annotation methods rely on characteristics that may not be available in short reads generated from next generation technology, resulting in suboptimal performance for metagenomic (environmental samples. Therefore, in recent years, new programs have been developed that optimize performance on short reads. In this work, we benchmark three metagenomic gene prediction programs and combine their predictions to improve metagenomic read gene annotation. Results We not only analyze the programs' performance at different read-lengths like similar studies, but also separate different types of reads, including intra- and intergenic regions, for analysis. The main deficiencies are in the algorithms' ability to predict non-coding regions and gene edges, resulting in more false-positives and false-negatives than desired. In fact, the specificities of the algorithms are notably worse than the sensitivities. By combining the programs' predictions, we show significant improvement in specificity at minimal cost to sensitivity, resulting in 4% improvement in accuracy for 100 bp reads with ~1% improvement in accuracy for 200 bp reads and above. To correctly annotate the start and stop of the genes, we find that a consensus of all the predictors performs best for shorter read lengths while a unanimous agreement is better for longer read lengths, boosting annotation accuracy by 1-8%. We also demonstrate use of the classifier combinations on a real dataset. Conclusions To optimize the performance for both prediction and annotation accuracies, we conclude that the consensus of all methods (or a majority vote is the best for reads 400 bp and shorter, while using the intersection of GeneMark and Orphelia predictions is the best for reads 500 bp and longer. We demonstrate that most methods predict over 80% coding (including partially coding reads on a real human gut sample sequenced by Illumina technology.

  5. Combining Gene Signatures Improves Prediction of Breast Cancer Survival

    Science.gov (United States)

    Zhao, Xi; Naume, Bjørn; Langerød, Anita; Frigessi, Arnoldo; Kristensen, Vessela N.; Børresen-Dale, Anne-Lise; Lingjærde, Ole Christian

    2011-01-01

    Background Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study. Principal Findings To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. Conclusion Combining the predictive strength of multiple gene signatures improves prediction of breast

  6. Combining gene signatures improves prediction of breast cancer survival.

    Directory of Open Access Journals (Sweden)

    Xi Zhao

    Full Text Available BACKGROUND: Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123 and test set (n = 81, respectively. Gene sets from eleven previously published gene signatures are included in the study. PRINCIPAL FINDINGS: To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014. Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001. The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. CONCLUSION: Combining the predictive strength of multiple gene signatures improves

  7. Cultural Resource Predictive Modeling

    Science.gov (United States)

    2017-10-01

    CR cultural resource CRM cultural resource management CRPM Cultural Resource Predictive Modeling DoD Department of Defense ESTCP Environmental...resource management ( CRM ) legal obligations under NEPA and the NHPA, military installations need to demonstrate that CRM decisions are based on objective...maxim “one size does not fit all,” and demonstrate that DoD installations have many different CRM needs that can and should be met through a variety

  8. Candidate Prediction Models and Methods

    DEFF Research Database (Denmark)

    Nielsen, Henrik Aalborg; Nielsen, Torben Skov; Madsen, Henrik

    2005-01-01

    This document lists candidate prediction models for Work Package 3 (WP3) of the PSO-project called ``Intelligent wind power prediction systems'' (FU4101). The main focus is on the models transforming numerical weather predictions into predictions of power production. The document also outlines...... the possibilities w.r.t. different numerical weather predictions actually available to the project....

  9. Gene Prediction in Metagenomic Fragments with Deep Learning

    Directory of Open Access Journals (Sweden)

    Shao-Wu Zhang

    2017-01-01

    Full Text Available Next generation sequencing technologies used in metagenomics yield numerous sequencing fragments which come from thousands of different species. Accurately identifying genes from metagenomics fragments is one of the most fundamental issues in metagenomics. In this article, by fusing multifeatures (i.e., monocodon usage, monoamino acid usage, ORF length coverage, and Z-curve features and using deep stacking networks learning model, we present a novel method (called Meta-MFDL to predict the metagenomic genes. The results with 10 CV and independent tests show that Meta-MFDL is a powerful tool for identifying genes from metagenomic fragments.

  10. Blood Gene Expression Predicts Bronchiolitis Obliterans Syndrome

    Directory of Open Access Journals (Sweden)

    Richard Danger

    2018-01-01

    Full Text Available Bronchiolitis obliterans syndrome (BOS, the main manifestation of chronic lung allograft dysfunction, leads to poor long-term survival after lung transplantation. Identifying predictors of BOS is essential to prevent the progression of dysfunction before irreversible damage occurs. By using a large set of 107 samples from lung recipients, we performed microarray gene expression profiling of whole blood to identify early biomarkers of BOS, including samples from 49 patients with stable function for at least 3 years, 32 samples collected at least 6 months before BOS diagnosis (prediction group, and 26 samples at or after BOS diagnosis (diagnosis group. An independent set from 25 lung recipients was used for validation by quantitative PCR (13 stables, 11 in the prediction group, and 8 in the diagnosis group. We identified 50 transcripts differentially expressed between stable and BOS recipients. Three genes, namely POU class 2 associating factor 1 (POU2AF1, T-cell leukemia/lymphoma protein 1A (TCL1A, and B cell lymphocyte kinase, were validated as predictive biomarkers of BOS more than 6 months before diagnosis, with areas under the curve of 0.83, 0.77, and 0.78 respectively. These genes allow stratification based on BOS risk (log-rank test p < 0.01 and are not associated with time posttransplantation. This is the first published large-scale gene expression analysis of blood after lung transplantation. The three-gene blood signature could provide clinicians with new tools to improve follow-up and adapt treatment of patients likely to develop BOS.

  11. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    International Nuclear Information System (INIS)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-01-01

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society

  12. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    Energy Technology Data Exchange (ETDEWEB)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-09-18

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society.

  13. Predictive Surface Complexation Modeling

    Energy Technology Data Exchange (ETDEWEB)

    Sverjensky, Dimitri A. [Johns Hopkins Univ., Baltimore, MD (United States). Dept. of Earth and Planetary Sciences

    2016-11-29

    Surface complexation plays an important role in the equilibria and kinetics of processes controlling the compositions of soilwaters and groundwaters, the fate of contaminants in groundwaters, and the subsurface storage of CO2 and nuclear waste. Over the last several decades, many dozens of individual experimental studies have addressed aspects of surface complexation that have contributed to an increased understanding of its role in natural systems. However, there has been no previous attempt to develop a model of surface complexation that can be used to link all the experimental studies in order to place them on a predictive basis. Overall, my research has successfully integrated the results of the work of many experimentalists published over several decades. For the first time in studies of the geochemistry of the mineral-water interface, a practical predictive capability for modeling has become available. The predictive correlations developed in my research now enable extrapolations of experimental studies to provide estimates of surface chemistry for systems not yet studied experimentally and for natural and anthropogenically perturbed systems.

  14. Association of a four-locus gene model including IL13, IL4, FCER1B, and ADRB2 with the Asthma Predictive Index and atopy in Chinese Han children.

    Science.gov (United States)

    Bai, S; Hua, L; Wang, X; Liu, Q; Bao, Y

    2018-05-11

    Asthma is a complex and heterogeneous disease. We found that gene-gene interactions among IL13 rs20541, IL4 rs2243250, ADRB2 rs1042713, and FCER1B rs569108 in asthmatic children of Chinese Han nationality. This four-locus set constituted an optimal statistical interaction model. Objective: This study examined associations of the four-gene model consisting of IL13, IL4, FCER1B, and ADRB2 with the Asthma Predictive Index (API) and atopy in Chinese Han children. Four single-nucleotide polymorphisms (SNPs) in the four genes were genotyped in 385 preschool children with wheezing symptoms using matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Student's t test and x2 tests were used for this analysis. : Significant correlations were found between the four-locus gene model and the stringent and loose API (both Pfour-locus gene model with atopy (Pfour-locus gene model consisting of L13 rs20541, IL4 rs2243250, ADRB2 rs1042713 and FCER1B rs569108 was associated with the API and atopy. These findings provide an evidence of the gene model for determining a high risk of developing asthma and atopy in Chinese Han children.

  15. Ecological transition predictably associated with gene degeneration.

    Science.gov (United States)

    Wessinger, Carolyn A; Rausher, Mark D

    2015-02-01

    Gene degeneration or loss can significantly contribute to phenotypic diversification, but may generate genetic constraints on future evolutionary trajectories, potentially restricting phenotypic reversal. Such constraints may manifest as directional evolutionary trends when parallel phenotypic shifts consistently involve gene degeneration or loss. Here, we demonstrate that widespread parallel evolution in Penstemon from blue to red flowers predictably involves the functional inactivation and degeneration of the enzyme flavonoid 3',5'-hydroxylase (F3'5'H), an anthocyanin pathway enzyme required for the production of blue floral pigments. Other types of genetic mutations do not consistently accompany this phenotypic shift. This pattern may be driven by the relatively large mutational target size of degenerative mutations to this locus and the apparent lack of associated pleiotropic effects. The consistent degeneration of F3'5'H may provide a mechanistic explanation for the observed asymmetry in the direction of flower color evolution in Penstemon: Blue to red transitions are common, but reverse transitions have not been observed. Although phenotypic shifts in this system are likely driven by natural selection, internal constraints may generate predictable genetic outcomes and may restrict future evolutionary trajectories. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  16. A comparative analysis of soft computing techniques for gene prediction.

    Science.gov (United States)

    Goel, Neelam; Singh, Shailendra; Aseri, Trilok Chand

    2013-07-01

    The rapid growth of genomic sequence data for both human and nonhuman species has made analyzing these sequences, especially predicting genes in them, very important and is currently the focus of many research efforts. Beside its scientific interest in the molecular biology and genomics community, gene prediction is of considerable importance in human health and medicine. A variety of gene prediction techniques have been developed for eukaryotes over the past few years. This article reviews and analyzes the application of certain soft computing techniques in gene prediction. First, the problem of gene prediction and its challenges are described. These are followed by different soft computing techniques along with their application to gene prediction. In addition, a comparative analysis of different soft computing techniques for gene prediction is given. Finally some limitations of the current research activities and future research directions are provided. Copyright © 2013 Elsevier Inc. All rights reserved.

  17. Confidence scores for prediction models

    DEFF Research Database (Denmark)

    Gerds, Thomas Alexander; van de Wiel, MA

    2011-01-01

    In medical statistics, many alternative strategies are available for building a prediction model based on training data. Prediction models are routinely compared by means of their prediction performance in independent validation data. If only one data set is available for training and validation,...

  18. An algorithm to discover gene signatures with predictive potential

    Directory of Open Access Journals (Sweden)

    Hallett Robin M

    2010-09-01

    Full Text Available Abstract Background The advent of global gene expression profiling has generated unprecedented insight into our molecular understanding of cancer, including breast cancer. For example, human breast cancer patients display significant diversity in terms of their survival, recurrence, metastasis as well as response to treatment. These patient outcomes can be predicted by the transcriptional programs of their individual breast tumors. Predictive gene signatures allow us to correctly classify human breast tumors into various risk groups as well as to more accurately target therapy to ensure more durable cancer treatment. Results Here we present a novel algorithm to generate gene signatures with predictive potential. The method first classifies the expression intensity for each gene as determined by global gene expression profiling as low, average or high. The matrix containing the classified data for each gene is then used to score the expression of each gene based its individual ability to predict the patient characteristic of interest. Finally, all examined genes are ranked based on their predictive ability and the most highly ranked genes are included in the master gene signature, which is then ready for use as a predictor. This method was used to accurately predict the survival outcomes in a cohort of human breast cancer patients. Conclusions We confirmed the capacity of our algorithm to generate gene signatures with bona fide predictive ability. The simplicity of our algorithm will enable biological researchers to quickly generate valuable gene signatures without specialized software or extensive bioinformatics training.

  19. PREDICTED PERCENTAGE DISSATISFIED (PPD) MODEL ...

    African Journals Online (AJOL)

    HOD

    their low power requirements, are relatively cheap and are environment friendly. ... PREDICTED PERCENTAGE DISSATISFIED MODEL EVALUATION OF EVAPORATIVE COOLING ... The performance of direct evaporative coolers is a.

  20. MGMT methylation analysis of glioblastoma on the Infinium methylation BeadChip identifies two distinct CpG regions associated with gene silencing and outcome, yielding a prediction model for comparisons across datasets, tumor grades, and CIMP-status.

    Science.gov (United States)

    Bady, Pierre; Sciuscio, Davide; Diserens, Annie-Claire; Bloch, Jocelyne; van den Bent, Martin J; Marosi, Christine; Dietrich, Pierre-Yves; Weller, Michael; Mariani, Luigi; Heppner, Frank L; Mcdonald, David R; Lacombe, Denis; Stupp, Roger; Delorenzi, Mauro; Hegi, Monika E

    2012-10-01

    The methylation status of the O(6)-methylguanine-DNA methyltransferase (MGMT) gene is an important predictive biomarker for benefit from alkylating agent therapy in glioblastoma. Recent studies in anaplastic glioma suggest a prognostic value for MGMT methylation. Investigation of pathogenetic and epigenetic features of this intriguingly distinct behavior requires accurate MGMT classification to assess high throughput molecular databases. Promoter methylation-mediated gene silencing is strongly dependent on the location of the methylated CpGs, complicating classification. Using the HumanMethylation450 (HM-450K) BeadChip interrogating 176 CpGs annotated for the MGMT gene, with 14 located in the promoter, two distinct regions in the CpG island of the promoter were identified with high importance for gene silencing and outcome prediction. A logistic regression model (MGMT-STP27) comprising probes cg12434587 [corrected] and cg12981137 provided good classification properties and prognostic value (kappa = 0.85; log-rank p CIMP) positive tumors was found in glioblastomas from The Cancer Genome Atlas than in low grade and anaplastic glioma cohorts, while in CIMP-negative gliomas MGMT was classified as methylated in approximately 50 % regardless of tumor grade. The proposed MGMT-STP27 prediction model allows mining of datasets derived on the HM-450K or HM-27K BeadChip to explore effects of distinct epigenetic context of MGMT methylation suspected to modulate treatment resistance in different tumor types.

  1. Bootstrap prediction and Bayesian prediction under misspecified models

    OpenAIRE

    Fushiki, Tadayoshi

    2005-01-01

    We consider a statistical prediction problem under misspecified models. In a sense, Bayesian prediction is an optimal prediction method when an assumed model is true. Bootstrap prediction is obtained by applying Breiman's `bagging' method to a plug-in prediction. Bootstrap prediction can be considered to be an approximation to the Bayesian prediction under the assumption that the model is true. However, in applications, there are frequently deviations from the assumed model. In this paper, bo...

  2. A Probabilistic Genome-Wide Gene Reading Frame Sequence Model

    DEFF Research Database (Denmark)

    Have, Christian Theil; Mørk, Søren

    We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested...... and are evaluated by the effect on prediction performance. Since bacterial gene finding to a large extent is a solved problem it forms an ideal proving ground for evaluating the explicit modeling of larger scale gene sequence composition of genomes. We conclude that the sequential composition of gene reading frames...

  3. A genome-wide gene function prediction resource for Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Han Yan

    2010-08-01

    Full Text Available Predicting gene functions by integrating large-scale biological data remains a challenge for systems biology. Here we present a resource for Drosophila melanogaster gene function predictions. We trained function-specific classifiers to optimize the influence of different biological datasets for each functional category. Our model predicted GO terms and KEGG pathway memberships for Drosophila melanogaster genes with high accuracy, as affirmed by cross-validation, supporting literature evidence, and large-scale RNAi screens. The resulting resource of prioritized associations between Drosophila genes and their potential functions offers a guide for experimental investigations.

  4. MODEL PREDICTIVE CONTROL FUNDAMENTALS

    African Journals Online (AJOL)

    2012-07-02

    Jul 2, 2012 ... signal based on a process model, coping with constraints on inputs and ... paper, we will present an introduction to the theory and application of MPC with Matlab codes ... section 5 presents the simulation results and section 6.

  5. Melanoma Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing melanoma cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  6. Modelling bankruptcy prediction models in Slovak companies

    Directory of Open Access Journals (Sweden)

    Kovacova Maria

    2017-01-01

    Full Text Available An intensive research from academics and practitioners has been provided regarding models for bankruptcy prediction and credit risk management. In spite of numerous researches focusing on forecasting bankruptcy using traditional statistics techniques (e.g. discriminant analysis and logistic regression and early artificial intelligence models (e.g. artificial neural networks, there is a trend for transition to machine learning models (support vector machines, bagging, boosting, and random forest to predict bankruptcy one year prior to the event. Comparing the performance of this with unconventional approach with results obtained by discriminant analysis, logistic regression, and neural networks application, it has been found that bagging, boosting, and random forest models outperform the others techniques, and that all prediction accuracy in the testing sample improves when the additional variables are included. On the other side the prediction accuracy of old and well known bankruptcy prediction models is quiet high. Therefore, we aim to analyse these in some way old models on the dataset of Slovak companies to validate their prediction ability in specific conditions. Furthermore, these models will be modelled according to new trends by calculating the influence of elimination of selected variables on the overall prediction ability of these models.

  7. Predictive models of moth development

    Science.gov (United States)

    Degree-day models link ambient temperature to insect life-stages, making such models valuable tools in integrated pest management. These models increase management efficacy by predicting pest phenology. In Wisconsin, the top insect pest of cranberry production is the cranberry fruitworm, Acrobasis v...

  8. Predictive Models and Computational Embryology

    Science.gov (United States)

    EPA’s ‘virtual embryo’ project is building an integrative systems biology framework for predictive models of developmental toxicity. One schema involves a knowledge-driven adverse outcome pathway (AOP) framework utilizing information from public databases, standardized ontologies...

  9. Functional validation of candidate genes detected by genomic feature models

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Østergaard, Solveig; Kristensen, Torsten Nygaard

    2018-01-01

    to investigate locomotor activity, and applied genomic feature prediction models to identify gene ontology (GO) cate- gories predictive of this phenotype. Next, we applied the covariance association test to partition the genomic variance of the predictive GO terms to the genes within these terms. We...... then functionally assessed whether the identified candidate genes affected locomotor activity by reducing gene expression using RNA interference. In five of the seven candidate genes tested, reduced gene expression altered the phenotype. The ranking of genes within the predictive GO term was highly correlated......Understanding the genetic underpinnings of complex traits requires knowledge of the genetic variants that contribute to phenotypic variability. Reliable statistical approaches are needed to obtain such knowledge. In genome-wide association studies, variants are tested for association with trait...

  10. Predictive Modeling in Race Walking

    Directory of Open Access Journals (Sweden)

    Krzysztof Wiktorowicz

    2015-01-01

    Full Text Available This paper presents the use of linear and nonlinear multivariable models as tools to support training process of race walkers. These models are calculated using data collected from race walkers’ training events and they are used to predict the result over a 3 km race based on training loads. The material consists of 122 training plans for 21 athletes. In order to choose the best model leave-one-out cross-validation method is used. The main contribution of the paper is to propose the nonlinear modifications for linear models in order to achieve smaller prediction error. It is shown that the best model is a modified LASSO regression with quadratic terms in the nonlinear part. This model has the smallest prediction error and simplified structure by eliminating some of the predictors.

  11. A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data.

    Science.gov (United States)

    Seok, Junhee; Davis, Ronald W; Xiao, Wenzhong

    2015-01-01

    Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn't been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.

  12. Testing the predictive value of peripheral gene expression for nonremission following citalopram treatment for major depression.

    Science.gov (United States)

    Guilloux, Jean-Philippe; Bassi, Sabrina; Ding, Ying; Walsh, Chris; Turecki, Gustavo; Tseng, George; Cyranowski, Jill M; Sibille, Etienne

    2015-02-01

    Major depressive disorder (MDD) in general, and anxious-depression in particular, are characterized by poor rates of remission with first-line treatments, contributing to the chronic illness burden suffered by many patients. Prospective research is needed to identify the biomarkers predicting nonremission prior to treatment initiation. We collected blood samples from a discovery cohort of 34 adult MDD patients with co-occurring anxiety and 33 matched, nondepressed controls at baseline and after 12 weeks (of citalopram plus psychotherapy treatment for the depressed cohort). Samples were processed on gene arrays and group differences in gene expression were investigated. Exploratory analyses suggest that at pretreatment baseline, nonremitting patients differ from controls with gene function and transcription factor analyses potentially related to elevated inflammation and immune activation. In a second phase, we applied an unbiased machine learning prediction model and corrected for model-selection bias. Results show that baseline gene expression predicted nonremission with 79.4% corrected accuracy with a 13-gene model. The same gene-only model predicted nonremission after 8 weeks of citalopram treatment with 76% corrected accuracy in an independent validation cohort of 63 MDD patients treated with citalopram at another institution. Together, these results demonstrate the potential, but also the limitations, of baseline peripheral blood-based gene expression to predict nonremission after citalopram treatment. These results not only support their use in future prediction tools but also suggest that increased accuracy may be obtained with the inclusion of additional predictors (eg, genetics and clinical scales).

  13. A network approach to predict pathogenic genes for Fusarium graminearum.

    Science.gov (United States)

    Liu, Xiaoping; Tang, Wei-Hua; Zhao, Xing-Ming; Chen, Luonan

    2010-10-04

    Fusarium graminearum is the pathogenic agent of Fusarium head blight (FHB), which is a destructive disease on wheat and barley, thereby causing huge economic loss and health problems to human by contaminating foods. Identifying pathogenic genes can shed light on pathogenesis underlying the interaction between F. graminearum and its plant host. However, it is difficult to detect pathogenic genes for this destructive pathogen by time-consuming and expensive molecular biological experiments in lab. On the other hand, computational methods provide an alternative way to solve this problem. Since pathogenesis is a complicated procedure that involves complex regulations and interactions, the molecular interaction network of F. graminearum can give clues to potential pathogenic genes. Furthermore, the gene expression data of F. graminearum before and after its invasion into plant host can also provide useful information. In this paper, a novel systems biology approach is presented to predict pathogenic genes of F. graminearum based on molecular interaction network and gene expression data. With a small number of known pathogenic genes as seed genes, a subnetwork that consists of potential pathogenic genes is identified from the protein-protein interaction network (PPIN) of F. graminearum, where the genes in the subnetwork are further required to be differentially expressed before and after the invasion of the pathogenic fungus. Therefore, the candidate genes in the subnetwork are expected to be involved in the same biological processes as seed genes, which imply that they are potential pathogenic genes. The prediction results show that most of the pathogenic genes of F. graminearum are enriched in two important signal transduction pathways, including G protein coupled receptor pathway and MAPK signaling pathway, which are known related to pathogenesis in other fungi. In addition, several pathogenic genes predicted by our method are verified in other pathogenic fungi, which

  14. Semi-supervised prediction of gene regulatory networks using ...

    Indian Academy of Sciences (India)

    2015-09-28

    Sep 28, 2015 ... Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging ... two types of methods differ primarily based on whether ..... negligible, allowing us to draw the qualitative conclusions .... research will be conducted to develop additional biologically.

  15. Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning.

    Science.gov (United States)

    He, Zhili; Zhang, Ping; Wu, Linwei; Rocha, Andrea M; Tu, Qichao; Shi, Zhou; Wu, Bo; Qin, Yujia; Wang, Jianjun; Yan, Qingyun; Curtis, Daniel; Ning, Daliang; Van Nostrand, Joy D; Wu, Liyou; Yang, Yunfeng; Elias, Dwayne A; Watson, David B; Adams, Michael W W; Fields, Matthew W; Alm, Eric J; Hazen, Terry C; Adams, Paul D; Arkin, Adam P; Zhou, Jizhong

    2018-02-20

    Contamination from anthropogenic activities has significantly impacted Earth's biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN), representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate) increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate) increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5) increased significantly ( P contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning. IMPORTANCE Disentangling the relationships between biodiversity and ecosystem functioning is an important but poorly understood topic in ecology. Predicting ecosystem functioning on the basis of biodiversity is even more difficult, particularly with microbial biomarkers. As an exploratory effort, this study used key microbial functional genes as biomarkers to provide predictive understanding of environmental contamination and ecosystem functioning. The results indicated that the overall functional gene richness/diversity decreased as uranium increased in groundwater, while specific key microbial guilds increased significantly as

  16. A stepwise model to predict monthly streamflow

    Science.gov (United States)

    Mahmood Al-Juboori, Anas; Guven, Aytac

    2016-12-01

    In this study, a stepwise model empowered with genetic programming is developed to predict the monthly flows of Hurman River in Turkey and Diyalah and Lesser Zab Rivers in Iraq. The model divides the monthly flow data to twelve intervals representing the number of months in a year. The flow of a month, t is considered as a function of the antecedent month's flow (t - 1) and it is predicted by multiplying the antecedent monthly flow by a constant value called K. The optimum value of K is obtained by a stepwise procedure which employs Gene Expression Programming (GEP) and Nonlinear Generalized Reduced Gradient Optimization (NGRGO) as alternative to traditional nonlinear regression technique. The degree of determination and root mean squared error are used to evaluate the performance of the proposed models. The results of the proposed model are compared with the conventional Markovian and Auto Regressive Integrated Moving Average (ARIMA) models based on observed monthly flow data. The comparison results based on five different statistic measures show that the proposed stepwise model performed better than Markovian model and ARIMA model. The R2 values of the proposed model range between 0.81 and 0.92 for the three rivers in this study.

  17. Predictability of Genetic Interactions from Functional Gene Modules

    Directory of Open Access Journals (Sweden)

    Jonathan H. Young

    2017-02-01

    Full Text Available Characterizing genetic interactions is crucial to understanding cellular and organismal response to gene-level perturbations. Such knowledge can inform the selection of candidate disease therapy targets, yet experimentally determining whether genes interact is technically nontrivial and time-consuming. High-fidelity prediction of different classes of genetic interactions in multiple organisms would substantially alleviate this experimental burden. Under the hypothesis that functionally related genes tend to share common genetic interaction partners, we evaluate a computational approach to predict genetic interactions in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae. By leveraging knowledge of functional relationships between genes, we cross-validate predictions on known genetic interactions and observe high predictive power of multiple classes of genetic interactions in all three organisms. Additionally, our method suggests high-confidence candidate interaction pairs that can be directly experimentally tested. A web application is provided for users to query genes for predicted novel genetic interaction partners. Finally, by subsampling the known yeast genetic interaction network, we found that novel genetic interactions are predictable even when knowledge of currently known interactions is minimal.

  18. Genome-wide prediction and analysis of human tissue-selective genes using microarray expression data

    Directory of Open Access Journals (Sweden)

    Teng Shaolei

    2013-01-01

    Full Text Available Abstract Background Understanding how genes are expressed specifically in particular tissues is a fundamental question in developmental biology. Many tissue-specific genes are involved in the pathogenesis of complex human diseases. However, experimental identification of tissue-specific genes is time consuming and difficult. The accurate predictions of tissue-specific gene targets could provide useful information for biomarker development and drug target identification. Results In this study, we have developed a machine learning approach for predicting the human tissue-specific genes using microarray expression data. The lists of known tissue-specific genes for different tissues were collected from UniProt database, and the expression data retrieved from the previously compiled dataset according to the lists were used for input vector encoding. Random Forests (RFs and Support Vector Machines (SVMs were used to construct accurate classifiers. The RF classifiers were found to outperform SVM models for tissue-specific gene prediction. The results suggest that the candidate genes for brain or liver specific expression can provide valuable information for further experimental studies. Our approach was also applied for identifying tissue-selective gene targets for different types of tissues. Conclusions A machine learning approach has been developed for accurately identifying the candidate genes for tissue specific/selective expression. The approach provides an efficient way to select some interesting genes for developing new biomedical markers and improve our knowledge of tissue-specific expression.

  19. Prediction of regulatory gene pairs using dynamic time warping and gene ontology.

    Science.gov (United States)

    Yang, Andy C; Hsu, Hui-Huang; Lu, Ming-Da; Tseng, Vincent S; Shih, Timothy K

    2014-01-01

    Selecting informative genes is the most important task for data analysis on microarray gene expression data. In this work, we aim at identifying regulatory gene pairs from microarray gene expression data. However, microarray data often contain multiple missing expression values. Missing value imputation is thus needed before further processing for regulatory gene pairs becomes possible. We develop a novel approach to first impute missing values in microarray time series data by combining k-Nearest Neighbour (KNN), Dynamic Time Warping (DTW) and Gene Ontology (GO). After missing values are imputed, we then perform gene regulation prediction based on our proposed DTW-GO distance measurement of gene pairs. Experimental results show that our approach is more accurate when compared with existing missing value imputation methods on real microarray data sets. Furthermore, our approach can also discover more regulatory gene pairs that are known in the literature than other methods.

  20. Fusion Genes Predict Prostate Cancer Recurrence

    Science.gov (United States)

    2017-10-01

    Stanford University. The UPMC cohort includes up to 1900 well-annotated and -followed radical prostatectomy samples. University of Wisconsin will provide...probes, animal models); • clinical interventions; • new business creation; and • other. 7. PARTICIPANTS & OTHER COLLABORATING

  1. Random Subspace Aggregation for Cancer Prediction with Gene Expression Profiles

    Directory of Open Access Journals (Sweden)

    Liying Yang

    2016-01-01

    Full Text Available Background. Precisely predicting cancer is crucial for cancer treatment. Gene expression profiles make it possible to analyze patterns between genes and cancers on the genome-wide scale. Gene expression data analysis, however, is confronted with enormous challenges for its characteristics, such as high dimensionality, small sample size, and low Signal-to-Noise Ratio. Results. This paper proposes a method, termed RS_SVM, to predict gene expression profiles via aggregating SVM trained on random subspaces. After choosing gene features through statistical analysis, RS_SVM randomly selects feature subsets to yield random subspaces and training SVM classifiers accordingly and then aggregates SVM classifiers to capture the advantage of ensemble learning. Experiments on eight real gene expression datasets are performed to validate the RS_SVM method. Experimental results show that RS_SVM achieved better classification accuracy and generalization performance in contrast with single SVM, K-nearest neighbor, decision tree, Bagging, AdaBoost, and the state-of-the-art methods. Experiments also explored the effect of subspace size on prediction performance. Conclusions. The proposed RS_SVM method yielded superior performance in analyzing gene expression profiles, which demonstrates that RS_SVM provides a good channel for such biological data.

  2. Clustering gene expression data based on predicted differential effects of GV interaction.

    Science.gov (United States)

    Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu

    2005-02-01

    Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.

  3. An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms.

    Science.gov (United States)

    Hua, Hong-Li; Zhang, Fa-Zhan; Labena, Abraham Alemayehu; Dong, Chuan; Jin, Yan-Ting; Guo, Feng-Biao

    Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus , which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.

  4. Bioinformatic prediction and functional characterization of human KIAA0100 gene

    Directory of Open Access Journals (Sweden)

    He Cui

    2017-02-01

    Full Text Available Our previous study demonstrated that human KIAA0100 gene was a novel acute monocytic leukemia-associated antigen (MLAA gene. But the functional characterization of human KIAA0100 gene has remained unknown to date. Here, firstly, bioinformatic prediction of human KIAA0100 gene was carried out using online softwares; Secondly, Human KIAA0100 gene expression was downregulated by the clustered regularly interspaced short palindromic repeats (CRISPR/CRISPR-associated (Cas 9 system in U937 cells. Cell proliferation and apoptosis were next evaluated in KIAA0100-knockdown U937 cells. The bioinformatic prediction showed that human KIAA0100 gene was located on 17q11.2, and human KIAA0100 protein was located in the secretory pathway. Besides, human KIAA0100 protein contained a signalpeptide, a transmembrane region, three types of secondary structures (alpha helix, extended strand, and random coil , and four domains from mitochondrial protein 27 (FMP27. The observation on functional characterization of human KIAA0100 gene revealed that its downregulation inhibited cell proliferation, and promoted cell apoptosis in U937 cells. To summarize, these results suggest human KIAA0100 gene possibly comes within mitochondrial genome; moreover, it is a novel anti-apoptotic factor related to carcinogenesis or progression in acute monocytic leukemia, and may be a potential target for immunotherapy against acute monocytic leukemia.

  5. A network approach to predict pathogenic genes for Fusarium graminearum.

    Directory of Open Access Journals (Sweden)

    Xiaoping Liu

    Full Text Available Fusarium graminearum is the pathogenic agent of Fusarium head blight (FHB, which is a destructive disease on wheat and barley, thereby causing huge economic loss and health problems to human by contaminating foods. Identifying pathogenic genes can shed light on pathogenesis underlying the interaction between F. graminearum and its plant host. However, it is difficult to detect pathogenic genes for this destructive pathogen by time-consuming and expensive molecular biological experiments in lab. On the other hand, computational methods provide an alternative way to solve this problem. Since pathogenesis is a complicated procedure that involves complex regulations and interactions, the molecular interaction network of F. graminearum can give clues to potential pathogenic genes. Furthermore, the gene expression data of F. graminearum before and after its invasion into plant host can also provide useful information. In this paper, a novel systems biology approach is presented to predict pathogenic genes of F. graminearum based on molecular interaction network and gene expression data. With a small number of known pathogenic genes as seed genes, a subnetwork that consists of potential pathogenic genes is identified from the protein-protein interaction network (PPIN of F. graminearum, where the genes in the subnetwork are further required to be differentially expressed before and after the invasion of the pathogenic fungus. Therefore, the candidate genes in the subnetwork are expected to be involved in the same biological processes as seed genes, which imply that they are potential pathogenic genes. The prediction results show that most of the pathogenic genes of F. graminearum are enriched in two important signal transduction pathways, including G protein coupled receptor pathway and MAPK signaling pathway, which are known related to pathogenesis in other fungi. In addition, several pathogenic genes predicted by our method are verified in other

  6. A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

    Science.gov (United States)

    Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A

    2006-01-01

    The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943

  7. Gene-specific function prediction for non-synonymous mutations in monogenic diabetes genes.

    Directory of Open Access Journals (Sweden)

    Quan Li

    Full Text Available The rapid progress of genomic technologies has been providing new opportunities to address the need of maturity-onset diabetes of the young (MODY molecular diagnosis. However, whether a new mutation causes MODY can be questionable. A number of in silico methods have been developed to predict functional effects of rare human mutations. The purpose of this study is to compare the performance of different bioinformatics methods in the functional prediction of nonsynonymous mutations in each MODY gene, and provides reference matrices to assist the molecular diagnosis of MODY. Our study showed that the prediction scores by different methods of the diabetes mutations were highly correlated, but were more complimentary than replacement to each other. The available in silico methods for the prediction of diabetes mutations had varied performances across different genes. Applying gene-specific thresholds defined by this study may be able to increase the performance of in silico prediction of disease-causing mutations.

  8. Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data

    Directory of Open Access Journals (Sweden)

    Osamu Komori

    2013-01-01

    Full Text Available This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems due to the unbalance in the number of observed genes compared with the number of observed subjects. We make a reanalysis of microarray gene expression published data to detect many other gene sets with almost the same performance. We conclude in the current stage that it is not possible to extract only informative genes with high performance in the all observed genes. We investigate the reason why this difficulty still exists even though there are actively proposed analysis methods and learning algorithms in statistical machine learning approaches. We focus on the mutual coherence or the absolute value of the Pearson correlations between two genes and describe the distributions of the correlation for the selected set of genes and the total set. We show that the problem of finding informative genes in high dimensional data is ill-posed and that the difficulty is closely related with the mutual coherence.

  9. Breast cancer risks and risk prediction models.

    Science.gov (United States)

    Engel, Christoph; Fischer, Christine

    2015-02-01

    BRCA1/2 mutation carriers have a considerably increased risk to develop breast and ovarian cancer. The personalized clinical management of carriers and other at-risk individuals depends on precise knowledge of the cancer risks. In this report, we give an overview of the present literature on empirical cancer risks, and we describe risk prediction models that are currently used for individual risk assessment in clinical practice. Cancer risks show large variability between studies. Breast cancer risks are at 40-87% for BRCA1 mutation carriers and 18-88% for BRCA2 mutation carriers. For ovarian cancer, the risk estimates are in the range of 22-65% for BRCA1 and 10-35% for BRCA2. The contralateral breast cancer risk is high (10-year risk after first cancer 27% for BRCA1 and 19% for BRCA2). Risk prediction models have been proposed to provide more individualized risk prediction, using additional knowledge on family history, mode of inheritance of major genes, and other genetic and non-genetic risk factors. User-friendly software tools have been developed that serve as basis for decision-making in family counseling units. In conclusion, further assessment of cancer risks and model validation is needed, ideally based on prospective cohort studies. To obtain such data, clinical management of carriers and other at-risk individuals should always be accompanied by standardized scientific documentation.

  10. A seven-gene CpG-island methylation panel predicts breast cancer progression

    International Nuclear Information System (INIS)

    Li, Yan; Melnikov, Anatoliy A.; Levenson, Victor; Guerra, Emanuela; Simeone, Pasquale; Alberti, Saverio; Deng, Youping

    2015-01-01

    DNA methylation regulates gene expression, through the inhibition/activation of gene transcription of methylated/unmethylated genes. Hence, DNA methylation profiling can capture pivotal features of gene expression in cancer tissues from patients at the time of diagnosis. In this work, we analyzed a breast cancer case series, to identify DNA methylation determinants of metastatic versus non-metastatic tumors. CpG-island methylation was evaluated on a 56-gene cancer-specific biomarker microarray in metastatic versus non-metastatic breast cancers in a multi-institutional case series of 123 breast cancer patients. Global statistical modeling and unsupervised hierarchical clustering were applied to identify a multi-gene binary classifier with high sensitivity and specificity. Network analysis was utilized to quantify the connectivity of the identified genes. Seven genes (BRCA1, DAPK1, MSH2, CDKN2A, PGR, PRKCDBP, RANKL) were found informative for prognosis of metastatic diffusion and were used to calculate classifier accuracy versus the entire data-set. Individual-gene performances showed sensitivities of 63–79 %, 53–84 % specificities, positive predictive values of 59–83 % and negative predictive values of 63–80 %. When modelled together, these seven genes reached a sensitivity of 93 %, 100 % specificity, a positive predictive value of 100 % and a negative predictive value of 93 %, with high statistical power. Unsupervised hierarchical clustering independently confirmed these findings, in close agreement with the accuracy measurements. Network analyses indicated tight interrelationship between the identified genes, suggesting this to be a functionally-coordinated module, linked to breast cancer progression. Our findings identify CpG-island methylation profiles with deep impact on clinical outcome, paving the way for use as novel prognostic assays in clinical settings. The online version of this article (doi:10.1186/s12885-015-1412-9) contains supplementary

  11. The prediction of candidate genes for cervix related cancer through gene ontology and graph theoretical approach.

    Science.gov (United States)

    Hindumathi, V; Kranthi, T; Rao, S B; Manimaran, P

    2014-06-01

    With rapidly changing technology, prediction of candidate genes has become an indispensable task in recent years mainly in the field of biological research. The empirical methods for candidate gene prioritization that succors to explore the potential pathway between genetic determinants and complex diseases are highly cumbersome and labor intensive. In such a scenario predicting potential targets for a disease state through in silico approaches are of researcher's interest. The prodigious availability of protein interaction data coupled with gene annotation renders an ease in the accurate determination of disease specific candidate genes. In our work we have prioritized the cervix related cancer candidate genes by employing Csaba Ortutay and his co-workers approach of identifying the candidate genes through graph theoretical centrality measures and gene ontology. With the advantage of the human protein interaction data, cervical cancer gene sets and the ontological terms, we were able to predict 15 novel candidates for cervical carcinogenesis. The disease relevance of the anticipated candidate genes was corroborated through a literature survey. Also the presence of the drugs for these candidates was detected through Therapeutic Target Database (TTD) and DrugMap Central (DMC) which affirms that they may be endowed as potential drug targets for cervical cancer.

  12. A predictive model of the association between gene polymorphism and the risk of noise-induced hearing loss caused by gunfire noise

    Directory of Open Access Journals (Sweden)

    Ben-Chih Yuan

    2012-01-01

    Conclusion: In this study, we found that although loud noise could usually result in hearing damage, the clinical characteristics of hearing loss were irrelevant to gunfire noise. The gene polymorphisms provide predictors for us to evaluate the risk of NIHL prior to gunshot training.

  13. Inductive matrix completion for predicting gene-disease associations.

    Science.gov (United States)

    Natarajan, Nagarajan; Dhillon, Inderjit S

    2014-06-15

    Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has bigdata.ices.utexas.edu/project/gene-disease. © The Author 2014. Published by Oxford University Press.

  14. Model predictive control using fuzzy decision functions

    NARCIS (Netherlands)

    Kaymak, U.; Costa Sousa, da J.M.

    2001-01-01

    Fuzzy predictive control integrates conventional model predictive control with techniques from fuzzy multicriteria decision making, translating the goals and the constraints to predictive control in a transparent way. The information regarding the (fuzzy) goals and the (fuzzy) constraints of the

  15. Prediction of epigenetically regulated genes in breast cancer cell lines

    Energy Technology Data Exchange (ETDEWEB)

    Loss, Leandro A; Sadanandam, Anguraj; Durinck, Steffen; Nautiyal, Shivani; Flaucher, Diane; Carlton, Victoria EH; Moorhead, Martin; Lu, Yontao; Gray, Joe W; Faham, Malek; Spellman, Paul; Parvin, Bahram

    2010-05-04

    panel of breast cancer cell lines. Subnetwork enrichment of these genes has identifed 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes. Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.

  16. Analysis and prediction of gene splice sites in four Aspergillus genomes

    DEFF Research Database (Denmark)

    Wang, Kai; Ussery, David; Brunak, Søren

    2009-01-01

    Several Aspergillus fungal genomic sequences have been published, with many more in progress. Obviously, it is essential to have high-quality, consistently annotated sets of proteins from each of the genomes, in order to make meaningful comparisons. We have developed a dedicated, publicly available......, splice site prediction program called NetAspGene, for the genus Aspergillus. Gene sequences from Aspergillus fumigatus, the most common mould pathogen, were used to build and test our model. Compared to many animals and plants, Aspergillus contains smaller introns; thus we have applied a larger window...... better splice site prediction than other available tools. NetAspGene will be very helpful for the study in Aspergillus splice sites and especially in alternative splicing. A webpage for NetAspGene is publicly available at http://www.cbs.dtu.dk/services/NetAspGene....

  17. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...

  18. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Science.gov (United States)

    Noar, Roslyn D; Daub, Margaret E

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity) for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity) to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that they may encode

  19. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Directory of Open Access Journals (Sweden)

    Roslyn D Noar

    Full Text Available Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that

  20. Combining many interaction networks to predict gene function and analyze gene lists.

    Science.gov (United States)

    Mostafavi, Sara; Morris, Quaid

    2012-05-01

    In this article, we review how interaction networks can be used alone or in combination in an automated fashion to provide insight into gene and protein function. We describe the concept of a "gene-recommender system" that can be applied to any large collection of interaction networks to make predictions about gene or protein function based on a query list of proteins that share a function of interest. We discuss these systems in general and focus on one specific system, GeneMANIA, that has unique features and uses different algorithms from the majority of other systems. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. The role of gene-gene interaction in the prediction of criminal behavior.

    Science.gov (United States)

    Boutwell, Brian B; Menard, Scott; Barnes, J C; Beaver, Kevin M; Armstrong, Todd A; Boisvert, Danielle

    2014-04-01

    A host of research has examined the possibility that environmental risk factors might condition the influence of genes on various outcomes. Less research, however, has been aimed at exploring the possibility that genetic factors might interact to impact the emergence of human traits. Even fewer studies exist examining the interaction of genes in the prediction of behavioral outcomes. The current study expands this body of research by testing the interaction between genes involved in neural transmission. Our findings suggest that certain dopamine genes interact to increase the odds of criminogenic outcomes in a national sample of Americans. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. GOPET: A tool for automated predictions of Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Glatting Karl-Heinz

    2006-03-01

    Full Text Available Abstract Background Vast progress in sequencing projects has called for annotation on a large scale. A Number of methods have been developed to address this challenging task. These methods, however, either apply to specific subsets, or their predictions are not formalised, or they do not provide precise confidence values for their predictions. Description We recently established a learning system for automated annotation, trained with a broad variety of different organisms to predict the standardised annotation terms from Gene Ontology (GO. Now, this method has been made available to the public via our web-service GOPET (Gene Ontology term Prediction and Evaluation Tool. It supplies annotation for sequences of any organism. For each predicted term an appropriate confidence value is provided. The basic method had been developed for predicting molecular function GO-terms. It is now expanded to predict biological process terms. This web service is available via http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar Conclusion Our web service gives experimental researchers as well as the bioinformatics community a valuable sequence annotation device. Additionally, GOPET also provides less significant annotation data which may serve as an extended discovery platform for the user.

  3. Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes.

    Science.gov (United States)

    Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan

    2017-10-03

    Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.

  4. Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning

    Directory of Open Access Journals (Sweden)

    Zhili He

    2018-02-01

    Full Text Available Contamination from anthropogenic activities has significantly impacted Earth’s biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN, representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5 increased significantly (P < 0.05 as uranium or nitrate increased, and their changes could be used to successfully predict uranium and nitrate contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning.

  5. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes

    Directory of Open Access Journals (Sweden)

    Yang Yi-Fan

    2007-03-01

    Full Text Available Abstract Background Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. Results This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs and Translation Initiation Sites (TISs. The former is based on a linguistic "Entropy Density Profile" (EDP model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Conclusion Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  6. Relative sensitivity analysis of the predictive properties of sloppy models.

    Science.gov (United States)

    Myasnikova, Ekaterina; Spirov, Alexander

    2018-01-25

    Commonly among the model parameters characterizing complex biological systems are those that do not significantly influence the quality of the fit to experimental data, so-called "sloppy" parameters. The sloppiness can be mathematically expressed through saturating response functions (Hill's, sigmoid) thereby embodying biological mechanisms responsible for the system robustness to external perturbations. However, if a sloppy model is used for the prediction of the system behavior at the altered input (e.g. knock out mutations, natural expression variability), it may demonstrate the poor predictive power due to the ambiguity in the parameter estimates. We introduce a method of the predictive power evaluation under the parameter estimation uncertainty, Relative Sensitivity Analysis. The prediction problem is addressed in the context of gene circuit models describing the dynamics of segmentation gene expression in Drosophila embryo. Gene regulation in these models is introduced by a saturating sigmoid function of the concentrations of the regulatory gene products. We show how our approach can be applied to characterize the essential difference between the sensitivity properties of robust and non-robust solutions and select among the existing solutions those providing the correct system behavior at any reasonable input. In general, the method allows to uncover the sources of incorrect predictions and proposes the way to overcome the estimation uncertainties.

  7. Entropy-based gene ranking without selection bias for the predictive classification of microarray data

    Directory of Open Access Journals (Sweden)

    Serafini Maria

    2003-11-01

    Full Text Available Abstract Background We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process. Results With E-RFE, we speed up the recursive feature elimination (RFE with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Conclusions Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.

  8. Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower.

    Science.gov (United States)

    Thorwarth, Patrick; Yousef, Eltohamy A A; Schmid, Karl J

    2018-02-02

    Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS) and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower ( Brassica oleracea var. botrytis ) by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS) and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding. Copyright © 2018 Thorwarth et al.

  9. Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower

    Directory of Open Access Journals (Sweden)

    Patrick Thorwarth

    2018-02-01

    Full Text Available Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower (Brassica oleracea var. botrytis by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding.

  10. Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning

    Science.gov (United States)

    Zhang, Ping; Wu, Linwei; Rocha, Andrea M.; Shi, Zhou; Wu, Bo; Qin, Yujia; Wang, Jianjun; Yan, Qingyun; Curtis, Daniel; Ning, Daliang; Van Nostrand, Joy D.; Wu, Liyou; Watson, David B.; Adams, Michael W. W.; Alm, Eric J.; Adams, Paul D.; Arkin, Adam P.

    2018-01-01

    ABSTRACT Contamination from anthropogenic activities has significantly impacted Earth’s biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN), representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate) increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate) increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5) increased significantly (P contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning. PMID:29463661

  11. Model Prediction Control For Water Management Using Adaptive Prediction Accuracy

    NARCIS (Netherlands)

    Tian, X.; Negenborn, R.R.; Van Overloop, P.J.A.T.M.; Mostert, E.

    2014-01-01

    In the field of operational water management, Model Predictive Control (MPC) has gained popularity owing to its versatility and flexibility. The MPC controller, which takes predictions, time delay and uncertainties into account, can be designed for multi-objective management problems and for

  12. Modeling the Activity of Single Genes

    Science.gov (United States)

    Mjolsness, Eric; Gibson, Michael

    1999-01-01

    -scale sequencing began with simple organisms, viruses and bacteria, progressed to eukaryotes such as yeast, and more recently (1998) progressed to a multi-cellular animal, the nematode Caenorhabditis elegans. Sequencers have now moved on to the fruit fly Drosophila melanogaster, whose sequence is slated for completion by the end of 1999. The human genome project is expected to determine the complete sequence of all 3 billion bases of human DNA within the next five years. In the wake of genome-scale sequencing, further instrumentation is being developed to assay gene expression and function on a comparably large scale. Much of the work in computational biology focuses on computational tools used in sequencing, finding genes that are related to a particular gene, finding which parts of the DNA code for proteins and which do not, understanding what proteins will be formed from a given length of DNA, predicting how the proteins will fold from a one-dimensional structure into a three dimensional structure, and so on. Much less computational work has been done regarding the function of proteins. One reason for this is that different proteins function very differently, and so work on protein function is very specific to certain classes of proteins. There are, for example, proteins such enzymes that catalyze various intracellular reactions, receptors that respond to extracellular signals and ion channels that regulate the flow of charged particles into and out of the cell. In this chapter, we will consider a particular class of proteins called transcription factors(TFs), which are responsible for regulating when a certain gene is expressed in a certain cell, which cells it is express in, and how much is expressed. Understanding these processes will involve developing a deeper understanding of transcription, translation, and the cellular processes that control those processes. All of these elements fall under the aegis of gene regulation or more narrowly transcriptional regulation. Some of

  13. EvoCor: a platform for predicting functionally related genes using phylogenetic and expression profiles.

    Science.gov (United States)

    Dittmar, W James; McIver, Lauren; Michalak, Pawel; Garner, Harold R; Valdez, Gregorio

    2014-07-01

    The wealth of publicly available gene expression and genomic data provides unique opportunities for computational inference to discover groups of genes that function to control specific cellular processes. Such genes are likely to have co-evolved and be expressed in the same tissues and cells. Unfortunately, the expertise and computational resources required to compare tens of genomes and gene expression data sets make this type of analysis difficult for the average end-user. Here, we describe the implementation of a web server that predicts genes involved in affecting specific cellular processes together with a gene of interest. We termed the server 'EvoCor', to denote that it detects functional relationships among genes through evolutionary analysis and gene expression correlation. This web server integrates profiles of sequence divergence derived by a Hidden Markov Model (HMM) and tissue-wide gene expression patterns to determine putative functional linkages between pairs of genes. This server is easy to use and freely available at http://pilot-hmm.vbi.vt.edu/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. [The value of 5-HTT gene polymorphism for the assessment and prediction of male adolescence violence].

    Science.gov (United States)

    Yu, Yue; Liu, Xiang; Yang, Zhen-xing; Qiu, Chang-jian; Ma, Xiao-hong

    2012-08-01

    To establish an adolescent violence crime prediction model, and to assess the value of serotonin transporter (5-HTT) gene polymorphism for the assessment and prediction of violent crime. Investigative tools were used to analyze the difference in personality dimensions, social support, coping styles, aggressiveness, impulsivity, and family condition scale between 223 adolescents with violence behavior and 148 adolescents without violence behavior. The distribution of 5-HTT gene polymorphisms (5-HTTLPR and 5-HTTVNTR) was compared between the two groups. The role of 5-HTT gene polymorphism on adolescent personality, impulsion and aggression scale also was also analyzed. Stepwise logistic regression was used to establish a predictive model for adolescent violent crime. Significant difference was found between the violence group and the control group on multiple dimensions of psychology and environment scales. However, no statistical difference was found with regard to the 5-HTT genotypes and alleles between adolescents with violent behaviors and normal controls. The rate of prediction accuracy was not significantly improved when 5-HTT gene polymorphism was taken into the model. The violent crime of adolescents was closely related with social and environmental factors. No association was found between 5-HTT polymorphisms and adolescent violence criminal behavior.

  15. Iowa calibration of MEPDG performance prediction models.

    Science.gov (United States)

    2013-06-01

    This study aims to improve the accuracy of AASHTO Mechanistic-Empirical Pavement Design Guide (MEPDG) pavement : performance predictions for Iowa pavement systems through local calibration of MEPDG prediction models. A total of 130 : representative p...

  16. Model complexity control for hydrologic prediction

    NARCIS (Netherlands)

    Schoups, G.; Van de Giesen, N.C.; Savenije, H.H.G.

    2008-01-01

    A common concern in hydrologic modeling is overparameterization of complex models given limited and noisy data. This leads to problems of parameter nonuniqueness and equifinality, which may negatively affect prediction uncertainties. A systematic way of controlling model complexity is therefore

  17. Identification of rat genes by TWINSCAN gene prediction, RT-PCR, and direct sequencing

    DEFF Research Database (Denmark)

    Wu, Jia Qian; Shteynberg, David; Arumugam, Manimozhiyan

    2004-01-01

    an alternative approach: reverse transcription-polymerase chain reaction (RT-PCR) and direct sequencing based on dual-genome de novo predictions from TWINSCAN. We tested 444 TWINSCAN-predicted rat genes that showed significant homology to known human genes implicated in disease but that were partially...... in the single-intron experiment. Spliced sequences were amplified in 46 cases (34%). We conclude that this procedure for elucidating gene structures with native cDNA sequences is cost-effective and will become even more so as it is further optimized.......The publication of a draft sequence of a third mammalian genome--that of the rat--suggests a need to rethink genome annotation. New mammalian sequences will not receive the kind of labor-intensive annotation efforts that are currently being devoted to human. In this paper, we demonstrate...

  18. MGMT methylation analysis of glioblastoma on the Infinium methylation BeadChip identifies two distinct CpG regions associated with gene silencing and outcome, yielding a prediction model for comparisons across datasets, tumor grades, and CIMP-status

    NARCIS (Netherlands)

    P. Bady (Pierre); D. Sciuscio (Davide); A.C. Diserens; J. Bloch (Jocelyne); M.J. van den Bent (Martin); C. Marosi (Christine); P-Y. Dietrich (Pierre Yves); M. Weller (Michael); L. Mariani (Luigi); F.L. Heppner (Frank ); D.R. Mcdonald (David ); D. Lacombe (Denis); R. Stupp (Roger); M. Delorenzi (Mauro); M.E. Hegi (Monika)

    2012-01-01

    textabstractThe methylation status of the O6-methylguanine- DNA methyltransferase (MGMT) gene is an important predictive biomarker for benefit from alkylating agent therapy in glioblastoma. Recent studies in anaplastic glioma suggest a prognostic value for MGMT methylation. Investigation of

  19. Coregulation of terpenoid pathway genes and prediction of isoprene production in Bacillus subtilis using transcriptomics

    Energy Technology Data Exchange (ETDEWEB)

    Hess, Becky M.; Xue, Junfeng; Markillie, Lye Meng; Taylor, Ronald C.; Wiley, H. S.; Ahring, Birgitte K.; Linggi, Bryan E.

    2013-06-19

    The isoprenoid pathway converts pyruvate to isoprene and related isoprenoid compounds in plants and some bacteria. Currently, this pathway is of great interest because of the critical role that isoprenoids play in basic cellular processes as well as the industrial value of metabolites such as isoprene. Although the regulation of several pathway genes has been described, there is a paucity of information regarding the system level regulation and control of the pathway. To address this limitation, we examined Bacillus subtilis grown under multiple conditions and then determined the relationship between altered isoprene production and the pattern of gene expression. We found that terpenoid genes appeared to fall into two distinct subsets with opposing correlations with respect to the amount of isoprene produced. The group whose expression levels positively correlated with isoprene production included dxs, the gene responsible for the commitment step in the pathway, as well as ispD, and two genes that participate in the mevalonate pathway, yhfS and pksG. The subset of terpenoid genes that inversely correlated with isoprene production included ispH, ispF, hepS, uppS, ispE, and dxr. A genome wide partial least squares regression model was created to identify other genes or pathways that contribute to isoprene production. This analysis showed that a subset of 213 regulated genes was sufficient to create a predictive model of isoprene production under different conditions and showed correlations at the transcriptional level. We conclude that gene expression levels alone are sufficiently informative about the metabolic state of a cell that produces increased isoprene and can be used to build a model which accurately predicts production of this secondary metabolite across many simulated environmental conditions.

  20. Nonlinear chaotic model for predicting storm surges

    Directory of Open Access Journals (Sweden)

    M. Siek

    2010-09-01

    Full Text Available This paper addresses the use of the methods of nonlinear dynamics and chaos theory for building a predictive chaotic model from time series. The chaotic model predictions are made by the adaptive local models based on the dynamical neighbors found in the reconstructed phase space of the observables. We implemented the univariate and multivariate chaotic models with direct and multi-steps prediction techniques and optimized these models using an exhaustive search method. The built models were tested for predicting storm surge dynamics for different stormy conditions in the North Sea, and are compared to neural network models. The results show that the chaotic models can generally provide reliable and accurate short-term storm surge predictions.

  1. Staying Power of Churn Prediction Models

    NARCIS (Netherlands)

    Risselada, Hans; Verhoef, Peter C.; Bijmolt, Tammo H. A.

    In this paper, we study the staying power of various churn prediction models. Staying power is defined as the predictive performance of a model in a number of periods after the estimation period. We examine two methods, logit models and classification trees, both with and without applying a bagging

  2. Predictive user modeling with actionable attributes

    NARCIS (Netherlands)

    Zliobaite, I.; Pechenizkiy, M.

    2013-01-01

    Different machine learning techniques have been proposed and used for modeling individual and group user needs, interests and preferences. In the traditional predictive modeling instances are described by observable variables, called attributes. The goal is to learn a model for predicting the target

  3. EFFICIENT PREDICTIVE MODELLING FOR ARCHAEOLOGICAL RESEARCH

    OpenAIRE

    Balla, A.; Pavlogeorgatos, G.; Tsiafakis, D.; Pavlidis, G.

    2014-01-01

    The study presents a general methodology for designing, developing and implementing predictive modelling for identifying areas of archaeological interest. The methodology is based on documented archaeological data and geographical factors, geospatial analysis and predictive modelling, and has been applied to the identification of possible Macedonian tombs’ locations in Northern Greece. The model was tested extensively and the results were validated using a commonly used predictive gain, which...

  4. In silico prediction of novel therapeutic targets using gene-disease association data.

    Science.gov (United States)

    Ferrero, Enrico; Dunham, Ian; Sanseau, Philippe

    2017-08-29

    Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets. We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature. Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target

  5. Robust predictions of the interacting boson model

    International Nuclear Information System (INIS)

    Casten, R.F.; Koeln Univ.

    1994-01-01

    While most recognized for its symmetries and algebraic structure, the IBA model has other less-well-known but equally intrinsic properties which give unavoidable, parameter-free predictions. These predictions concern central aspects of low-energy nuclear collective structure. This paper outlines these ''robust'' predictions and compares them with the data

  6. Comparison of Prediction-Error-Modelling Criteria

    DEFF Research Database (Denmark)

    Jørgensen, John Bagterp; Jørgensen, Sten Bay

    2007-01-01

    Single and multi-step prediction-error-methods based on the maximum likelihood and least squares criteria are compared. The prediction-error methods studied are based on predictions using the Kalman filter and Kalman predictors for a linear discrete-time stochastic state space model, which is a r...

  7. Landscape genetics as a tool for conservation planning: predicting the effects of landscape change on gene flow.

    Science.gov (United States)

    van Strien, Maarten J; Keller, Daniela; Holderegger, Rolf; Ghazoul, Jaboury; Kienast, Felix; Bolliger, Janine

    2014-03-01

    For conservation managers, it is important to know whether landscape changes lead to increasing or decreasing gene flow. Although the discipline of landscape genetics assesses the influence of landscape elements on gene flow, no studies have yet used landscape-genetic models to predict gene flow resulting from landscape change. A species that has already been severely affected by landscape change is the large marsh grasshopper (Stethophyma grossum), which inhabits moist areas in fragmented agricultural landscapes in Switzerland. From transects drawn between all population pairs within maximum dispersal distance (landscape composition as well as some measures of habitat configuration. Additionally, a complete sampling of all populations in our study area allowed incorporating measures of population topology. These measures together with the landscape metrics formed the predictor variables in linear models with gene flow as response variable (F(ST) and mean pairwise assignment probability). With a modified leave-one-out cross-validation approach, we selected the model with the highest predictive accuracy. With this model, we predicted gene flow under several landscape-change scenarios, which simulated construction, rezoning or restoration projects, and the establishment of a new population. For some landscape-change scenarios, significant increase or decrease in gene flow was predicted, while for others little change was forecast. Furthermore, we found that the measures of population topology strongly increase model fit in landscape genetic analysis. This study demonstrates the use of predictive landscape-genetic models in conservation and landscape planning.

  8. Current approaches to gene regulatory network modelling

    Directory of Open Access Journals (Sweden)

    Brazma Alvis

    2007-09-01

    Full Text Available Abstract Many different approaches have been developed to model and simulate gene regulatory networks. We proposed the following categories for gene regulatory network models: network parts lists, network topology models, network control logic models, and dynamic models. Here we will describe some examples for each of these categories. We will study the topology of gene regulatory networks in yeast in more detail, comparing a direct network derived from transcription factor binding data and an indirect network derived from genome-wide expression data in mutants. Regarding the network dynamics we briefly describe discrete and continuous approaches to network modelling, then describe a hybrid model called Finite State Linear Model and demonstrate that some simple network dynamics can be simulated in this model.

  9. Extracting falsifiable predictions from sloppy models.

    Science.gov (United States)

    Gutenkunst, Ryan N; Casey, Fergal P; Waterfall, Joshua J; Myers, Christopher R; Sethna, James P

    2007-12-01

    Successful predictions are among the most compelling validations of any model. Extracting falsifiable predictions from nonlinear multiparameter models is complicated by the fact that such models are commonly sloppy, possessing sensitivities to different parameter combinations that range over many decades. Here we discuss how sloppiness affects the sorts of data that best constrain model predictions, makes linear uncertainty approximations dangerous, and introduces computational difficulties in Monte-Carlo uncertainty analysis. We also present a useful test problem and suggest refinements to the standards by which models are communicated.

  10. The prediction of epidemics through mathematical modeling.

    Science.gov (United States)

    Schaus, Catherine

    2014-01-01

    Mathematical models may be resorted to in an endeavor to predict the development of epidemics. The SIR model is one of the applications. Still too approximate, the use of statistics awaits more data in order to come closer to reality.

  11. Calibration of PMIS pavement performance prediction models.

    Science.gov (United States)

    2012-02-01

    Improve the accuracy of TxDOTs existing pavement performance prediction models through calibrating these models using actual field data obtained from the Pavement Management Information System (PMIS). : Ensure logical performance superiority patte...

  12. Evaluating Predictive Uncertainty of Hyporheic Exchange Modelling

    Science.gov (United States)

    Chow, R.; Bennett, J.; Dugge, J.; Wöhling, T.; Nowak, W.

    2017-12-01

    Hyporheic exchange is the interaction of water between rivers and groundwater, and is difficult to predict. One of the largest contributions to predictive uncertainty for hyporheic fluxes have been attributed to the representation of heterogeneous subsurface properties. This research aims to evaluate which aspect of the subsurface representation - the spatial distribution of hydrofacies or the model for local-scale (within-facies) heterogeneity - most influences the predictive uncertainty. Also, we seek to identify data types that help reduce this uncertainty best. For this investigation, we conduct a modelling study of the Steinlach River meander, in Southwest Germany. The Steinlach River meander is an experimental site established in 2010 to monitor hyporheic exchange at the meander scale. We use HydroGeoSphere, a fully integrated surface water-groundwater model, to model hyporheic exchange and to assess the predictive uncertainty of hyporheic exchange transit times (HETT). A highly parameterized complex model is built and treated as `virtual reality', which is in turn modelled with simpler subsurface parameterization schemes (Figure). Then, we conduct Monte-Carlo simulations with these models to estimate the predictive uncertainty. Results indicate that: Uncertainty in HETT is relatively small for early times and increases with transit times. Uncertainty from local-scale heterogeneity is negligible compared to uncertainty in the hydrofacies distribution. Introducing more data to a poor model structure may reduce predictive variance, but does not reduce predictive bias. Hydraulic head observations alone cannot constrain the uncertainty of HETT, however an estimate of hyporheic exchange flux proves to be more effective at reducing this uncertainty. Figure: Approach for evaluating predictive model uncertainty. A conceptual model is first developed from the field investigations. A complex model (`virtual reality') is then developed based on that conceptual model

  13. Thermodynamics-based models of transcriptional regulation with gene sequence.

    Science.gov (United States)

    Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing

    2015-12-01

    Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.

  14. Case studies in archaeological predictive modelling

    NARCIS (Netherlands)

    Verhagen, Jacobus Wilhelmus Hermanus Philippus

    2007-01-01

    In this thesis, a collection of papers is put together dealing with various quantitative aspects of predictive modelling and archaeological prospection. Among the issues covered are the effects of survey bias on the archaeological data used for predictive modelling, and the complexities of testing

  15. Clinical Prediction Models for Cardiovascular Disease: Tufts Predictive Analytics and Comparative Effectiveness Clinical Prediction Model Database.

    Science.gov (United States)

    Wessler, Benjamin S; Lai Yh, Lana; Kramer, Whitney; Cangelosi, Michael; Raman, Gowri; Lutz, Jennifer S; Kent, David M

    2015-07-01

    Clinical prediction models (CPMs) estimate the probability of clinical outcomes and hold the potential to improve decision making and individualize care. For patients with cardiovascular disease, there are numerous CPMs available although the extent of this literature is not well described. We conducted a systematic review for articles containing CPMs for cardiovascular disease published between January 1990 and May 2012. Cardiovascular disease includes coronary heart disease, heart failure, arrhythmias, stroke, venous thromboembolism, and peripheral vascular disease. We created a novel database and characterized CPMs based on the stage of development, population under study, performance, covariates, and predicted outcomes. There are 796 models included in this database. The number of CPMs published each year is increasing steadily over time. Seven hundred seventeen (90%) are de novo CPMs, 21 (3%) are CPM recalibrations, and 58 (7%) are CPM adaptations. This database contains CPMs for 31 index conditions, including 215 CPMs for patients with coronary artery disease, 168 CPMs for population samples, and 79 models for patients with heart failure. There are 77 distinct index/outcome pairings. Of the de novo models in this database, 450 (63%) report a c-statistic and 259 (36%) report some information on calibration. There is an abundance of CPMs available for a wide assortment of cardiovascular disease conditions, with substantial redundancy in the literature. The comparative performance of these models, the consistency of effects and risk estimates across models and the actual and potential clinical impact of this body of literature is poorly understood. © 2015 American Heart Association, Inc.

  16. Incorporating uncertainty in predictive species distribution modelling.

    Science.gov (United States)

    Beale, Colin M; Lennon, Jack J

    2012-01-19

    Motivated by the need to solve ecological problems (climate change, habitat fragmentation and biological invasions), there has been increasing interest in species distribution models (SDMs). Predictions from these models inform conservation policy, invasive species management and disease-control measures. However, predictions are subject to uncertainty, the degree and source of which is often unrecognized. Here, we review the SDM literature in the context of uncertainty, focusing on three main classes of SDM: niche-based models, demographic models and process-based models. We identify sources of uncertainty for each class and discuss how uncertainty can be minimized or included in the modelling process to give realistic measures of confidence around predictions. Because this has typically not been performed, we conclude that uncertainty in SDMs has often been underestimated and a false precision assigned to predictions of geographical distribution. We identify areas where development of new statistical tools will improve predictions from distribution models, notably the development of hierarchical models that link different types of distribution model and their attendant uncertainties across spatial scales. Finally, we discuss the need to develop more defensible methods for assessing predictive performance, quantifying model goodness-of-fit and for assessing the significance of model covariates.

  17. Model Predictive Control for Smart Energy Systems

    DEFF Research Database (Denmark)

    Halvgaard, Rasmus

    pumps, heat tanks, electrical vehicle battery charging/discharging, wind farms, power plants). 2.Embed forecasting methodologies for the weather (e.g. temperature, solar radiation), the electricity consumption, and the electricity price in a predictive control system. 3.Develop optimization algorithms....... Chapter 3 introduces Model Predictive Control (MPC) including state estimation, filtering and prediction for linear models. Chapter 4 simulates the models from Chapter 2 with the certainty equivalent MPC from Chapter 3. An economic MPC minimizes the costs of consumption based on real electricity prices...... that determined the flexibility of the units. A predictive control system easily handles constraints, e.g. limitations in power consumption, and predicts the future behavior of a unit by integrating predictions of electricity prices, consumption, and weather variables. The simulations demonstrate the expected...

  18. Integrative Analysis of Gene Expression Data Including an Assessment of Pathway Enrichment for Predicting Prostate Cancer

    Directory of Open Access Journals (Sweden)

    Pingzhao Hu

    2006-01-01

    biological pathways. In particular, we observed that by integrating information from the insulin signalling pathway into our prediction model, we achieved better prediction of prostate cancer. Conclusions: Our data integration methodology provides an efficient way to identify biologically sound and statistically significant pathways from gene expression data. The significant gene expression phenotypes identified in our study have the potential to characterize complex genetic alterations in prostate cancer.

  19. Evaluating the Predictive Value of Growth Prediction Models

    Science.gov (United States)

    Murphy, Daniel L.; Gaertner, Matthew N.

    2014-01-01

    This study evaluates four growth prediction models--projection, student growth percentile, trajectory, and transition table--commonly used to forecast (and give schools credit for) middle school students' future proficiency. Analyses focused on vertically scaled summative mathematics assessments, and two performance standards conditions (high…

  20. Model predictive control classical, robust and stochastic

    CERN Document Server

    Kouvaritakis, Basil

    2016-01-01

    For the first time, a textbook that brings together classical predictive control with treatment of up-to-date robust and stochastic techniques. Model Predictive Control describes the development of tractable algorithms for uncertain, stochastic, constrained systems. The starting point is classical predictive control and the appropriate formulation of performance objectives and constraints to provide guarantees of closed-loop stability and performance. Moving on to robust predictive control, the text explains how similar guarantees may be obtained for cases in which the model describing the system dynamics is subject to additive disturbances and parametric uncertainties. Open- and closed-loop optimization are considered and the state of the art in computationally tractable methods based on uncertainty tubes presented for systems with additive model uncertainty. Finally, the tube framework is also applied to model predictive control problems involving hard or probabilistic constraints for the cases of multiplic...

  1. Modeling, robust and distributed model predictive control for freeway networks

    NARCIS (Netherlands)

    Liu, S.

    2016-01-01

    In Model Predictive Control (MPC) for traffic networks, traffic models are crucial since they are used as prediction models for determining the optimal control actions. In order to reduce the computational complexity of MPC for traffic networks, macroscopic traffic models are often used instead of

  2. Deep Predictive Models in Interactive Music

    OpenAIRE

    Martin, Charles P.; Ellefsen, Kai Olav; Torresen, Jim

    2018-01-01

    Automatic music generation is a compelling task where much recent progress has been made with deep learning models. In this paper, we ask how these models can be integrated into interactive music systems; how can they encourage or enhance the music making of human users? Musical performance requires prediction to operate instruments, and perform in groups. We argue that predictive models could help interactive systems to understand their temporal context, and ensemble behaviour. Deep learning...

  3. Identification of a robust gene signature that predicts breast cancer outcome in independent data sets

    International Nuclear Information System (INIS)

    Korkola, James E; Waldman, Frederic M; Blaveri, Ekaterina; DeVries, Sandy; Moore, Dan H II; Hwang, E Shelley; Chen, Yunn-Yi; Estep, Anne LH; Chew, Karen L; Jensen, Ronald H

    2007-01-01

    Breast cancer is a heterogeneous disease, presenting with a wide range of histologic, clinical, and genetic features. Microarray technology has shown promise in predicting outcome in these patients. We profiled 162 breast tumors using expression microarrays to stratify tumors based on gene expression. A subset of 55 tumors with extensive follow-up was used to identify gene sets that predicted outcome. The predictive gene set was further tested in previously published data sets. We used different statistical methods to identify three gene sets associated with disease free survival. A fourth gene set, consisting of 21 genes in common to all three sets, also had the ability to predict patient outcome. To validate the predictive utility of this derived gene set, it was tested in two published data sets from other groups. This gene set resulted in significant separation of patients on the basis of survival in these data sets, correctly predicting outcome in 62–65% of patients. By comparing outcome prediction within subgroups based on ER status, grade, and nodal status, we found that our gene set was most effective in predicting outcome in ER positive and node negative tumors. This robust gene selection with extensive validation has identified a predictive gene set that may have clinical utility for outcome prediction in breast cancer patients

  4. Evolutionary modeling and prediction of non-coding RNAs in Drosophila.

    Directory of Open Access Journals (Sweden)

    Robert K Bradley

    2009-08-01

    Full Text Available We performed benchmarks of phylogenetic grammar-based ncRNA gene prediction, experimenting with eight different models of structural evolution and two different programs for genome alignment. We evaluated our models using alignments of twelve Drosophila genomes. We find that ncRNA prediction performance can vary greatly between different gene predictors and subfamilies of ncRNA gene. Our estimates for false positive rates are based on simulations which preserve local islands of conservation; using these simulations, we predict a higher rate of false positives than previous computational ncRNA screens have reported. Using one of the tested prediction grammars, we provide an updated set of ncRNA predictions for D. melanogaster and compare them to previously-published predictions and experimental data. Many of our predictions show correlations with protein-coding genes. We found significant depletion of intergenic predictions near the 3' end of coding regions and furthermore depletion of predictions in the first intron of protein-coding genes. Some of our predictions are colocated with larger putative unannotated genes: for example, 17 of our predictions showing homology to the RFAM family snoR28 appear in a tandem array on the X chromosome; the 4.5 Kbp spanned by the predicted tandem array is contained within a FlyBase-annotated cDNA.

  5. Predictive gene lists for breast cancer prognosis: A topographic visualisation study

    Directory of Open Access Journals (Sweden)

    Lowe David

    2008-04-01

    Full Text Available Abstract Background The controversy surrounding the non-uniqueness of predictive gene lists (PGL of small selected subsets of genes from very large potential candidates as available in DNA microarray experiments is now widely acknowledged 1. Many of these studies have focused on constructing discriminative semi-parametric models and as such are also subject to the issue of random correlations of sparse model selection in high dimensional spaces. In this work we outline a different approach based around an unsupervised patient-specific nonlinear topographic projection in predictive gene lists. Methods We construct nonlinear topographic projection maps based on inter-patient gene-list relative dissimilarities. The Neuroscale, the Stochastic Neighbor Embedding(SNE and the Locally Linear Embedding(LLE techniques have been used to construct two-dimensional projective visualisation plots of 70 dimensional PGLs per patient, classifiers are also constructed to identify the prognosis indicator of each patient using the resulting projections from those visualisation techniques and investigate whether a-posteriori two prognosis groups are separable on the evidence of the gene lists. A literature-proposed predictive gene list for breast cancer is benchmarked against a separate gene list using the above methods. Generalisation ability is investigated by using the mapping capability of Neuroscale to visualise the follow-up study, but based on the projections derived from the original dataset. Results The results indicate that small subsets of patient-specific PGLs have insufficient prognostic dissimilarity to permit a distinction between two prognosis patients. Uncertainty and diversity across multiple gene expressions prevents unambiguous or even confident patient grouping. Comparative projections across different PGLs provide similar results. Conclusion The random correlation effect to an arbitrary outcome induced by small subset selection from very high

  6. Can Thrifty Gene(s or Predictive Fetal Programming for Thriftiness Lead to Obesity?

    Directory of Open Access Journals (Sweden)

    Ulfat Baig

    2011-01-01

    Full Text Available Obesity and related disorders are thought to have their roots in metabolic “thriftiness” that evolved to combat periodic starvation. The association of low birth weight with obesity in later life caused a shift in the concept from thrifty gene to thrifty phenotype or anticipatory fetal programming. The assumption of thriftiness is implicit in obesity research. We examine here, with the help of a mathematical model, the conditions for evolution of thrifty genes or fetal programming for thriftiness. The model suggests that a thrifty gene cannot exist in a stable polymorphic state in a population. The conditions for evolution of thrifty fetal programming are restricted if the correlation between intrauterine and lifetime conditions is poor. Such a correlation is not observed in natural courses of famine. If there is fetal programming for thriftiness, it could have evolved in anticipation of social factors affecting nutrition that can result in a positive correlation.

  7. Unreachable Setpoints in Model Predictive Control

    DEFF Research Database (Denmark)

    Rawlings, James B.; Bonné, Dennis; Jørgensen, John Bagterp

    2008-01-01

    In this work, a new model predictive controller is developed that handles unreachable setpoints better than traditional model predictive control methods. The new controller induces an interesting fast/slow asymmetry in the tracking response of the system. Nominal asymptotic stability of the optimal...... steady state is established for terminal constraint model predictive control (MPC). The region of attraction is the steerable set. Existing analysis methods for closed-loop properties of MPC are not applicable to this new formulation, and a new analysis method is developed. It is shown how to extend...

  8. Bayesian Predictive Models for Rayleigh Wind Speed

    DEFF Research Database (Denmark)

    Shahirinia, Amir; Hajizadeh, Amin; Yu, David C

    2017-01-01

    predictive model of the wind speed aggregates the non-homogeneous distributions into a single continuous distribution. Therefore, the result is able to capture the variation among the probability distributions of the wind speeds at the turbines’ locations in a wind farm. More specifically, instead of using...... a wind speed distribution whose parameters are known or estimated, the parameters are considered as random whose variations are according to probability distributions. The Bayesian predictive model for a Rayleigh which only has a single model scale parameter has been proposed. Also closed-form posterior...... and predictive inferences under different reasonable choices of prior distribution in sensitivity analysis have been presented....

  9. Gene expression prediction by soft integration and the elastic net-best performance of the DREAM3 gene expression challenge.

    Directory of Open Access Journals (Sweden)

    Mika Gustafsson

    Full Text Available BACKGROUND: To predict gene expressions is an important endeavour within computational systems biology. It can both be a way to explore how drugs affect the system, as well as providing a framework for finding which genes are interrelated in a certain process. A practical problem, however, is how to assess and discriminate among the various algorithms which have been developed for this purpose. Therefore, the DREAM project invited the year 2008 to a challenge for predicting gene expression values, and here we present the algorithm with best performance. METHODOLOGY/PRINCIPAL FINDINGS: We develop an algorithm by exploring various regression schemes with different model selection procedures. It turns out that the most effective scheme is based on least squares, with a penalty term of a recently developed form called the "elastic net". Key components in the algorithm are the integration of expression data from other experimental conditions than those presented for the challenge and the utilization of transcription factor binding data for guiding the inference process towards known interactions. Of importance is also a cross-validation procedure where each form of external data is used only to the extent it increases the expected performance. CONCLUSIONS/SIGNIFICANCE: Our algorithm proves both the possibility to extract information from large-scale expression data concerning prediction of gene levels, as well as the benefits of integrating different data sources for improving the inference. We believe the former is an important message to those still hesitating on the possibilities for computational approaches, while the latter is part of an important way forward for the future development of the field of computational systems biology.

  10. Predictive Modelling and Time: An Experiment in Temporal Archaeological Predictive Models

    OpenAIRE

    David Ebert

    2006-01-01

    One of the most common criticisms of archaeological predictive modelling is that it fails to account for temporal or functional differences in sites. However, a practical solution to temporal or functional predictive modelling has proven to be elusive. This article discusses temporal predictive modelling, focusing on the difficulties of employing temporal variables, then introduces and tests a simple methodology for the implementation of temporal modelling. The temporal models thus created ar...

  11. Fingerprint verification prediction model in hand dermatitis.

    Science.gov (United States)

    Lee, Chew K; Chang, Choong C; Johor, Asmah; Othman, Puwira; Baba, Roshidah

    2015-07-01

    Hand dermatitis associated fingerprint changes is a significant problem and affects fingerprint verification processes. This study was done to develop a clinically useful prediction model for fingerprint verification in patients with hand dermatitis. A case-control study involving 100 patients with hand dermatitis. All patients verified their thumbprints against their identity card. Registered fingerprints were randomized into a model derivation and model validation group. Predictive model was derived using multiple logistic regression. Validation was done using the goodness-of-fit test. The fingerprint verification prediction model consists of a major criterion (fingerprint dystrophy area of ≥ 25%) and two minor criteria (long horizontal lines and long vertical lines). The presence of the major criterion predicts it will almost always fail verification, while presence of both minor criteria and presence of one minor criterion predict high and low risk of fingerprint verification failure, respectively. When none of the criteria are met, the fingerprint almost always passes the verification. The area under the receiver operating characteristic curve was 0.937, and the goodness-of-fit test showed agreement between the observed and expected number (P = 0.26). The derived fingerprint verification failure prediction model is validated and highly discriminatory in predicting risk of fingerprint verification in patients with hand dermatitis. © 2014 The International Society of Dermatology.

  12. Massive Predictive Modeling using Oracle R Enterprise

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    R is fast becoming the lingua franca for analyzing data via statistics, visualization, and predictive analytics. For enterprise-scale data, R users have three main concerns: scalability, performance, and production deployment. Oracle's R-based technologies - Oracle R Distribution, Oracle R Enterprise, Oracle R Connector for Hadoop, and the R package ROracle - address these concerns. In this talk, we introduce Oracle's R technologies, highlighting how each enables R users to achieve scalability and performance while making production deployment of R results a natural outcome of the data analyst/scientist efforts. The focus then turns to Oracle R Enterprise with code examples using the transparency layer and embedded R execution, targeting massive predictive modeling. One goal behind massive predictive modeling is to build models per entity, such as customers, zip codes, simulations, in an effort to understand behavior and tailor predictions at the entity level. Predictions...

  13. PRGdb 3.0: a comprehensive platform for prediction and analysis of plant disease resistance genes.

    Science.gov (United States)

    Osuna-Cruz, Cristina M; Paytuvi-Gallart, Andreu; Di Donato, Antimo; Sundesha, Vicky; Andolfo, Giuseppe; Aiese Cigliano, Riccardo; Sanseverino, Walter; Ercolano, Maria R

    2018-01-04

    The Plant Resistance Genes database (PRGdb; http://prgdb.org) has been redesigned with a new user interface, new sections, new tools and new data for genetic improvement, allowing easy access not only to the plant science research community but also to breeders who want to improve plant disease resistance. The home page offers an overview of easy-to-read search boxes that streamline data queries and directly show plant species for which data from candidate or cloned genes have been collected. Bulk data files and curated resistance gene annotations are made available for each plant species hosted. The new Gene Model view offers detailed information on each cloned resistance gene structure to highlight shared attributes with other genes. PRGdb 3.0 offers 153 reference resistance genes and 177 072 annotated candidate Pathogen Receptor Genes (PRGs). Compared to the previous release, the number of putative genes has been increased from 106 to 177 K from 76 sequenced Viridiplantae and algae genomes. The DRAGO 2 tool, which automatically annotates and predicts (PRGs) from DNA and amino acid with high accuracy and sensitivity, has been added. BLAST search has been implemented to offer users the opportunity to annotate and compare their own sequences. The improved section on plant diseases displays useful information linked to genes and genomes to connect complementary data and better address specific needs. Through, a revised and enlarged collection of data, the development of new tools and a renewed portal, PRGdb 3.0 engages the plant science community in developing a consensus plan to improve knowledge and strategies to fight diseases that afflict main crops and other plants. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Multi-model analysis in hydrological prediction

    Science.gov (United States)

    Lanthier, M.; Arsenault, R.; Brissette, F.

    2017-12-01

    Hydrologic modelling, by nature, is a simplification of the real-world hydrologic system. Therefore ensemble hydrological predictions thus obtained do not present the full range of possible streamflow outcomes, thereby producing ensembles which demonstrate errors in variance such as under-dispersion. Past studies show that lumped models used in prediction mode can return satisfactory results, especially when there is not enough information available on the watershed to run a distributed model. But all lumped models greatly simplify the complex processes of the hydrologic cycle. To generate more spread in the hydrologic ensemble predictions, multi-model ensembles have been considered. In this study, the aim is to propose and analyse a method that gives an ensemble streamflow prediction that properly represents the forecast probabilities and reduced ensemble bias. To achieve this, three simple lumped models are used to generate an ensemble. These will also be combined using multi-model averaging techniques, which generally generate a more accurate hydrogram than the best of the individual models in simulation mode. This new predictive combined hydrogram is added to the ensemble, thus creating a large ensemble which may improve the variability while also improving the ensemble mean bias. The quality of the predictions is then assessed on different periods: 2 weeks, 1 month, 3 months and 6 months using a PIT Histogram of the percentiles of the real observation volumes with respect to the volumes of the ensemble members. Initially, the models were run using historical weather data to generate synthetic flows. This worked for individual models, but not for the multi-model and for the large ensemble. Consequently, by performing data assimilation at each prediction period and thus adjusting the initial states of the models, the PIT Histogram could be constructed using the observed flows while allowing the use of the multi-model predictions. The under-dispersion has been

  15. Prostate Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing prostate cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  16. Colorectal Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing colorectal cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  17. Esophageal Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing esophageal cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  18. Bladder Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing bladder cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  19. Lung Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing lung cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  20. Breast Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing breast cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  1. Pancreatic Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing pancreatic cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  2. Ovarian Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing ovarian cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  3. Liver Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing liver cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  4. Testicular Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of testicular cervical cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  5. Cervical Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing cervical cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  6. Modeling and Prediction Using Stochastic Differential Equations

    DEFF Research Database (Denmark)

    Juhl, Rune; Møller, Jan Kloppenborg; Jørgensen, John Bagterp

    2016-01-01

    Pharmacokinetic/pharmakodynamic (PK/PD) modeling for a single subject is most often performed using nonlinear models based on deterministic ordinary differential equations (ODEs), and the variation between subjects in a population of subjects is described using a population (mixed effects) setup...... deterministic and can predict the future perfectly. A more realistic approach would be to allow for randomness in the model due to e.g., the model be too simple or errors in input. We describe a modeling and prediction setup which better reflects reality and suggests stochastic differential equations (SDEs...

  7. Predictive Model of Systemic Toxicity (SOT)

    Science.gov (United States)

    In an effort to ensure chemical safety in light of regulatory advances away from reliance on animal testing, USEPA and L’Oréal have collaborated to develop a quantitative systemic toxicity prediction model. Prediction of human systemic toxicity has proved difficult and remains a ...

  8. CRC-113 gene expression signature for predicting prognosis in patients with colorectal cancer.

    Science.gov (United States)

    Nguyen, Minh Nam; Choi, Tae Gyu; Nguyen, Dinh Truong; Kim, Jin-Hwan; Jo, Yong Hwa; Shahid, Muhammad; Akter, Salima; Aryal, Saurav Nath; Yoo, Ji Youn; Ahn, Yong-Joo; Cho, Kyoung Min; Lee, Ju-Seog; Choe, Wonchae; Kang, Insug; Ha, Joohun; Kim, Sung Soo

    2015-10-13

    Colorectal cancer (CRC) is the third leading cause of global cancer mortality. Recent studies have proposed several gene signatures to predict CRC prognosis, but none of those have proven reliable for predicting prognosis in clinical practice yet due to poor reproducibility and molecular heterogeneity. Here, we have established a prognostic signature of 113 probe sets (CRC-113) that include potential biomarkers and reflect the biological and clinical characteristics. Robustness and accuracy were significantly validated in external data sets from 19 centers in five countries. In multivariate analysis, CRC-113 gene signature showed a stronger prognostic value for survival and disease recurrence in CRC patients than current clinicopathological risk factors and molecular alterations. We also demonstrated that the CRC-113 gene signature reflected both genetic and epigenetic molecular heterogeneity in CRC patients. Furthermore, incorporation of the CRC-113 gene signature into a clinical context and molecular markers further refined the selection of the CRC patients who might benefit from postoperative chemotherapy. Conclusively, CRC-113 gene signature provides new possibilities for improving prognostic models and personalized therapeutic strategies.

  9. A model of gene-gene and gene-environment interactions and its implications for targeting environmental interventions by genotype

    Directory of Open Access Journals (Sweden)

    Wallace Helen M

    2006-10-01

    Full Text Available Abstract Background The potential public health benefits of targeting environmental interventions by genotype depend on the environmental and genetic contributions to the variance of common diseases, and the magnitude of any gene-environment interaction. In the absence of prior knowledge of all risk factors, twin, family and environmental data may help to define the potential limits of these benefits in a given population. However, a general methodology to analyze twin data is required because of the potential importance of gene-gene interactions (epistasis, gene-environment interactions, and conditions that break the 'equal environments' assumption for monozygotic and dizygotic twins. Method A new model for gene-gene and gene-environment interactions is developed that abandons the assumptions of the classical twin study, including Fisher's (1918 assumption that genes act as risk factors for common traits in a manner necessarily dominated by an additive polygenic term. Provided there are no confounders, the model can be used to implement a top-down approach to quantifying the potential utility of genetic prediction and prevention, using twin, family and environmental data. The results describe a solution space for each disease or trait, which may or may not include the classical twin study result. Each point in the solution space corresponds to a different model of genotypic risk and gene-environment interaction. Conclusion The results show that the potential for reducing the incidence of common diseases using environmental interventions targeted by genotype may be limited, except in special cases. The model also confirms that the importance of an individual's genotype in determining their risk of complex diseases tends to be exaggerated by the classical twin studies method, owing to the 'equal environments' assumption and the assumption of no gene-environment interaction. In addition, if phenotypes are genetically robust, because of epistasis

  10. Spent fuel: prediction model development

    International Nuclear Information System (INIS)

    Almassy, M.Y.; Bosi, D.M.; Cantley, D.A.

    1979-07-01

    The need for spent fuel disposal performance modeling stems from a requirement to assess the risks involved with deep geologic disposal of spent fuel, and to support licensing and public acceptance of spent fuel repositories. Through the balanced program of analysis, diagnostic testing, and disposal demonstration tests, highlighted in this presentation, the goal of defining risks and of quantifying fuel performance during long-term disposal can be attained

  11. Navy Recruit Attrition Prediction Modeling

    Science.gov (United States)

    2014-09-01

    have high correlation with attrition, such as age, job characteristics, command climate, marital status, behavior issues prior to recruitment, and the...the additive model. glm(formula = Outcome ~ Age + Gender + Marital + AFQTCat + Pay + Ed + Dep, family = binomial, data = ltraining) Deviance ...0.1 ‘ ‘ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance : 105441 on 85221 degrees of freedom Residual deviance

  12. Predicting and Modeling RNA Architecture

    Science.gov (United States)

    Westhof, Eric; Masquida, Benoît; Jossinet, Fabrice

    2011-01-01

    SUMMARY A general approach for modeling the architecture of large and structured RNA molecules is described. The method exploits the modularity and the hierarchical folding of RNA architecture that is viewed as the assembly of preformed double-stranded helices defined by Watson-Crick base pairs and RNA modules maintained by non-Watson-Crick base pairs. Despite the extensive molecular neutrality observed in RNA structures, specificity in RNA folding is achieved through global constraints like lengths of helices, coaxiality of helical stacks, and structures adopted at the junctions of helices. The Assemble integrated suite of computer tools allows for sequence and structure analysis as well as interactive modeling by homology or ab initio assembly with possibilities for fitting within electronic density maps. The local key role of non-Watson-Crick pairs guides RNA architecture formation and offers metrics for assessing the accuracy of three-dimensional models in a more useful way than usual root mean square deviation (RMSD) values. PMID:20504963

  13. Predictive Models and Computational Toxicology (II IBAMTOX)

    Science.gov (United States)

    EPA’s ‘virtual embryo’ project is building an integrative systems biology framework for predictive models of developmental toxicity. One schema involves a knowledge-driven adverse outcome pathway (AOP) framework utilizing information from public databases, standardized ontologies...

  14. Finding furfural hydrogenation catalysts via predictive modelling

    NARCIS (Netherlands)

    Strassberger, Z.; Mooijman, M.; Ruijter, E.; Alberts, A.H.; Maldonado, A.G.; Orru, R.V.A.; Rothenberg, G.

    2010-01-01

    We combine multicomponent reactions, catalytic performance studies and predictive modelling to find transfer hydrogenation catalysts. An initial set of 18 ruthenium-carbene complexes were synthesized and screened in the transfer hydrogenation of furfural to furfurol with isopropyl alcohol complexes

  15. FINITE ELEMENT MODEL FOR PREDICTING RESIDUAL ...

    African Journals Online (AJOL)

    FINITE ELEMENT MODEL FOR PREDICTING RESIDUAL STRESSES IN ... the transverse residual stress in the x-direction (σx) had a maximum value of 375MPa ... the finite element method are in fair agreement with the experimental results.

  16. Evaluation of CASP8 model quality predictions

    KAUST Repository

    Cozzetto, Domenico; Kryshtafovych, Andriy; Tramontano, Anna

    2009-01-01

    established a prediction category to evaluate their performance in 2006. In 2008 the experiment was repeated and its results are reported here. Participants were invited to infer the correctness of the protein models submitted by the registered automatic

  17. Predictive gene testing for Huntington disease and other neurodegenerative disorders.

    Science.gov (United States)

    Wedderburn, S; Panegyres, P K; Andrew, S; Goldblatt, J; Liebeck, T; McGrath, F; Wiltshire, M; Pestell, C; Lee, J; Beilby, J

    2013-12-01

    Controversies exist around predictive testing (PT) programmes in neurodegenerative disorders. This study sets out to answer the following questions relating to Huntington disease (HD) and other neurodegenerative disorders: differences between these patients in their PT journeys, why and when individuals withdraw from PT, and decision-making processes regarding reproductive genetic testing. A case series analysis of patients having PT from the multidisciplinary Western Australian centre for PT over the past 20 years was performed using internationally recognised guidelines for predictive gene testing in neurodegenerative disorders. Of 740 at-risk patients, 518 applied for PT: 466 at risk of HD, 52 at risk of other neurodegenerative disorders - spinocerebellar ataxias, hereditary prion disease and familial Alzheimer disease. Thirteen percent withdrew from PT - 80.32% of withdrawals occurred during counselling stages. Major withdrawal reasons related to timing in the patients' lives or unknown as the patient did not disclose the reason. Thirty-eight HD individuals had reproductive genetic testing: 34 initiated prenatal testing (of which eight withdrew from the process) and four initiated pre-implantation genetic diagnosis. There was no recorded or other evidence of major psychological reactions or suicides during PT. People withdrew from PT in relation to life stages and reasons that are unknown. Our findings emphasise the importance of: (i) adherence to internationally recommended guidelines for PT; (ii) the role of the multidisciplinary team in risk minimisation; and (iii) patient selection. © 2013 The Authors; Internal Medicine Journal © 2013 Royal Australasian College of Physicians.

  18. Network-based prediction and knowledge mining of disease genes.

    Science.gov (United States)

    Carson, Matthew B; Lu, Hui

    2015-01-01

    In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second-order neighbors in the PPI network

  19. Mental models accurately predict emotion transitions.

    Science.gov (United States)

    Thornton, Mark A; Tamir, Diana I

    2017-06-06

    Successful social interactions depend on people's ability to predict others' future actions and emotions. People possess many mechanisms for perceiving others' current emotional states, but how might they use this information to predict others' future states? We hypothesized that people might capitalize on an overlooked aspect of affective experience: current emotions predict future emotions. By attending to regularities in emotion transitions, perceivers might develop accurate mental models of others' emotional dynamics. People could then use these mental models of emotion transitions to predict others' future emotions from currently observable emotions. To test this hypothesis, studies 1-3 used data from three extant experience-sampling datasets to establish the actual rates of emotional transitions. We then collected three parallel datasets in which participants rated the transition likelihoods between the same set of emotions. Participants' ratings of emotion transitions predicted others' experienced transitional likelihoods with high accuracy. Study 4 demonstrated that four conceptual dimensions of mental state representation-valence, social impact, rationality, and human mind-inform participants' mental models. Study 5 used 2 million emotion reports on the Experience Project to replicate both of these findings: again people reported accurate models of emotion transitions, and these models were informed by the same four conceptual dimensions. Importantly, neither these conceptual dimensions nor holistic similarity could fully explain participants' accuracy, suggesting that their mental models contain accurate information about emotion dynamics above and beyond what might be predicted by static emotion knowledge alone.

  20. Mental models accurately predict emotion transitions

    Science.gov (United States)

    Thornton, Mark A.; Tamir, Diana I.

    2017-01-01

    Successful social interactions depend on people’s ability to predict others’ future actions and emotions. People possess many mechanisms for perceiving others’ current emotional states, but how might they use this information to predict others’ future states? We hypothesized that people might capitalize on an overlooked aspect of affective experience: current emotions predict future emotions. By attending to regularities in emotion transitions, perceivers might develop accurate mental models of others’ emotional dynamics. People could then use these mental models of emotion transitions to predict others’ future emotions from currently observable emotions. To test this hypothesis, studies 1–3 used data from three extant experience-sampling datasets to establish the actual rates of emotional transitions. We then collected three parallel datasets in which participants rated the transition likelihoods between the same set of emotions. Participants’ ratings of emotion transitions predicted others’ experienced transitional likelihoods with high accuracy. Study 4 demonstrated that four conceptual dimensions of mental state representation—valence, social impact, rationality, and human mind—inform participants’ mental models. Study 5 used 2 million emotion reports on the Experience Project to replicate both of these findings: again people reported accurate models of emotion transitions, and these models were informed by the same four conceptual dimensions. Importantly, neither these conceptual dimensions nor holistic similarity could fully explain participants’ accuracy, suggesting that their mental models contain accurate information about emotion dynamics above and beyond what might be predicted by static emotion knowledge alone. PMID:28533373

  1. Return Predictability, Model Uncertainty, and Robust Investment

    DEFF Research Database (Denmark)

    Lukas, Manuel

    Stock return predictability is subject to great uncertainty. In this paper we use the model confidence set approach to quantify uncertainty about expected utility from investment, accounting for potential return predictability. For monthly US data and six representative return prediction models, we...... find that confidence sets are very wide, change significantly with the predictor variables, and frequently include expected utilities for which the investor prefers not to invest. The latter motivates a robust investment strategy maximizing the minimal element of the confidence set. The robust investor...... allocates a much lower share of wealth to stocks compared to a standard investor....

  2. Model predictive Controller for Mobile Robot

    OpenAIRE

    Alireza Rezaee

    2017-01-01

    This paper proposes a Model Predictive Controller (MPC) for control of a P2AT mobile robot. MPC refers to a group of controllers that employ a distinctly identical model of process to predict its future behavior over an extended prediction horizon. The design of a MPC is formulated as an optimal control problem. Then this problem is considered as linear quadratic equation (LQR) and is solved by making use of Ricatti equation. To show the effectiveness of the proposed method this controller is...

  3. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger.

    Science.gov (United States)

    Wright, James C; Sugden, Deana; Francis-McIntyre, Sue; Riba-Garcia, Isabel; Gaskell, Simon J; Grigoriev, Igor V; Baker, Scott E; Beynon, Robert J; Hubbard, Simon J

    2009-02-04

    Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR). 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  4. Spatial Economics Model Predicting Transport Volume

    Directory of Open Access Journals (Sweden)

    Lu Bo

    2016-10-01

    Full Text Available It is extremely important to predict the logistics requirements in a scientific and rational way. However, in recent years, the improvement effect on the prediction method is not very significant and the traditional statistical prediction method has the defects of low precision and poor interpretation of the prediction model, which cannot only guarantee the generalization ability of the prediction model theoretically, but also cannot explain the models effectively. Therefore, in combination with the theories of the spatial economics, industrial economics, and neo-classical economics, taking city of Zhuanghe as the research object, the study identifies the leading industry that can produce a large number of cargoes, and further predicts the static logistics generation of the Zhuanghe and hinterlands. By integrating various factors that can affect the regional logistics requirements, this study established a logistics requirements potential model from the aspect of spatial economic principles, and expanded the way of logistics requirements prediction from the single statistical principles to an new area of special and regional economics.

  5. Accuracy assessment of landslide prediction models

    International Nuclear Information System (INIS)

    Othman, A N; Mohd, W M N W; Noraini, S

    2014-01-01

    The increasing population and expansion of settlements over hilly areas has greatly increased the impact of natural disasters such as landslide. Therefore, it is important to developed models which could accurately predict landslide hazard zones. Over the years, various techniques and models have been developed to predict landslide hazard zones. The aim of this paper is to access the accuracy of landslide prediction models developed by the authors. The methodology involved the selection of study area, data acquisition, data processing and model development and also data analysis. The development of these models are based on nine different landslide inducing parameters i.e. slope, land use, lithology, soil properties, geomorphology, flow accumulation, aspect, proximity to river and proximity to road. Rank sum, rating, pairwise comparison and AHP techniques are used to determine the weights for each of the parameters used. Four (4) different models which consider different parameter combinations are developed by the authors. Results obtained are compared to landslide history and accuracies for Model 1, Model 2, Model 3 and Model 4 are 66.7, 66.7%, 60% and 22.9% respectively. From the results, rank sum, rating and pairwise comparison can be useful techniques to predict landslide hazard zones

  6. Probability-based collaborative filtering model for predicting gene–disease associations

    OpenAIRE

    Zeng, Xiangxiang; Ding, Ningxiang; Rodríguez-Patón, Alfonso; Zou, Quan

    2017-01-01

    Background Accurately predicting pathogenic human genes has been challenging in recent research. Considering extensive gene–disease data verified by biological experiments, we can apply computational methods to perform accurate predictions with reduced time and expenses. Methods We propose a probability-based collaborative filtering model (PCFM) to predict pathogenic human genes. Several kinds of data sets, containing data of humans and data of other nonhuman species, are integrated in our mo...

  7. Predictive minimum description length principle approach to inferring gene regulatory networks.

    Science.gov (United States)

    Chaitankar, Vijender; Zhang, Chaoyang; Ghosh, Preetam; Gong, Ping; Perkins, Edward J; Deng, Youping

    2011-01-01

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold that defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we propose a new inference algorithm that incorporates mutual information (MI), conditional mutual information (CMI), and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm is evaluated using both synthetic time series data sets and a biological time series data set (Saccharomyces cerevisiae). The results show that the proposed algorithm produced fewer false edges and significantly improved the precision when compared to existing MDL algorithm.

  8. Predictive validation of an influenza spread model.

    Directory of Open Access Journals (Sweden)

    Ayaz Hyder

    Full Text Available BACKGROUND: Modeling plays a critical role in mitigating impacts of seasonal influenza epidemics. Complex simulation models are currently at the forefront of evaluating optimal mitigation strategies at multiple scales and levels of organization. Given their evaluative role, these models remain limited in their ability to predict and forecast future epidemics leading some researchers and public-health practitioners to question their usefulness. The objective of this study is to evaluate the predictive ability of an existing complex simulation model of influenza spread. METHODS AND FINDINGS: We used extensive data on past epidemics to demonstrate the process of predictive validation. This involved generalizing an individual-based model for influenza spread and fitting it to laboratory-confirmed influenza infection data from a single observed epidemic (1998-1999. Next, we used the fitted model and modified two of its parameters based on data on real-world perturbations (vaccination coverage by age group and strain type. Simulating epidemics under these changes allowed us to estimate the deviation/error between the expected epidemic curve under perturbation and observed epidemics taking place from 1999 to 2006. Our model was able to forecast absolute intensity and epidemic peak week several weeks earlier with reasonable reliability and depended on the method of forecasting-static or dynamic. CONCLUSIONS: Good predictive ability of influenza epidemics is critical for implementing mitigation strategies in an effective and timely manner. Through the process of predictive validation applied to a current complex simulation model of influenza spread, we provided users of the model (e.g. public-health officials and policy-makers with quantitative metrics and practical recommendations on mitigating impacts of seasonal influenza epidemics. This methodology may be applied to other models of communicable infectious diseases to test and potentially improve

  9. Predictive Validation of an Influenza Spread Model

    Science.gov (United States)

    Hyder, Ayaz; Buckeridge, David L.; Leung, Brian

    2013-01-01

    Background Modeling plays a critical role in mitigating impacts of seasonal influenza epidemics. Complex simulation models are currently at the forefront of evaluating optimal mitigation strategies at multiple scales and levels of organization. Given their evaluative role, these models remain limited in their ability to predict and forecast future epidemics leading some researchers and public-health practitioners to question their usefulness. The objective of this study is to evaluate the predictive ability of an existing complex simulation model of influenza spread. Methods and Findings We used extensive data on past epidemics to demonstrate the process of predictive validation. This involved generalizing an individual-based model for influenza spread and fitting it to laboratory-confirmed influenza infection data from a single observed epidemic (1998–1999). Next, we used the fitted model and modified two of its parameters based on data on real-world perturbations (vaccination coverage by age group and strain type). Simulating epidemics under these changes allowed us to estimate the deviation/error between the expected epidemic curve under perturbation and observed epidemics taking place from 1999 to 2006. Our model was able to forecast absolute intensity and epidemic peak week several weeks earlier with reasonable reliability and depended on the method of forecasting-static or dynamic. Conclusions Good predictive ability of influenza epidemics is critical for implementing mitigation strategies in an effective and timely manner. Through the process of predictive validation applied to a current complex simulation model of influenza spread, we provided users of the model (e.g. public-health officials and policy-makers) with quantitative metrics and practical recommendations on mitigating impacts of seasonal influenza epidemics. This methodology may be applied to other models of communicable infectious diseases to test and potentially improve their predictive

  10. Finding Furfural Hydrogenation Catalysts via Predictive Modelling.

    Science.gov (United States)

    Strassberger, Zea; Mooijman, Maurice; Ruijter, Eelco; Alberts, Albert H; Maldonado, Ana G; Orru, Romano V A; Rothenberg, Gadi

    2010-09-10

    We combine multicomponent reactions, catalytic performance studies and predictive modelling to find transfer hydrogenation catalysts. An initial set of 18 ruthenium-carbene complexes were synthesized and screened in the transfer hydrogenation of furfural to furfurol with isopropyl alcohol complexes gave varied yields, from 62% up to >99.9%, with no obvious structure/activity correlations. Control experiments proved that the carbene ligand remains coordinated to the ruthenium centre throughout the reaction. Deuterium-labelling studies showed a secondary isotope effect (k(H):k(D)=1.5). Further mechanistic studies showed that this transfer hydrogenation follows the so-called monohydride pathway. Using these data, we built a predictive model for 13 of the catalysts, based on 2D and 3D molecular descriptors. We tested and validated the model using the remaining five catalysts (cross-validation, R(2)=0.913). Then, with this model, the conversion and selectivity were predicted for four completely new ruthenium-carbene complexes. These four catalysts were then synthesized and tested. The results were within 3% of the model's predictions, demonstrating the validity and value of predictive modelling in catalyst optimization.

  11. Corporate prediction models, ratios or regression analysis?

    NARCIS (Netherlands)

    Bijnen, E.J.; Wijn, M.F.C.M.

    1994-01-01

    The models developed in the literature with respect to the prediction of a company s failure are based on ratios. It has been shown before that these models should be rejected on theoretical grounds. Our study of industrial companies in the Netherlands shows that the ratios which are used in

  12. Predicting Protein Secondary Structure with Markov Models

    DEFF Research Database (Denmark)

    Fischer, Paul; Larsen, Simon; Thomsen, Claus

    2004-01-01

    we are considering here, is to predict the secondary structure from the primary one. To this end we train a Markov model on training data and then use it to classify parts of unknown protein sequences as sheets, helices or coils. We show how to exploit the directional information contained...... in the Markov model for this task. Classifications that are purely based on statistical models might not always be biologically meaningful. We present combinatorial methods to incorporate biological background knowledge to enhance the prediction performance....

  13. Energy based prediction models for building acoustics

    DEFF Research Database (Denmark)

    Brunskog, Jonas

    2012-01-01

    In order to reach robust and simplified yet accurate prediction models, energy based principle are commonly used in many fields of acoustics, especially in building acoustics. This includes simple energy flow models, the framework of statistical energy analysis (SEA) as well as more elaborated...... principles as, e.g., wave intensity analysis (WIA). The European standards for building acoustic predictions, the EN 12354 series, are based on energy flow and SEA principles. In the present paper, different energy based prediction models are discussed and critically reviewed. Special attention is placed...... on underlying basic assumptions, such as diffuse fields, high modal overlap, resonant field being dominant, etc., and the consequences of these in terms of limitations in the theory and in the practical use of the models....

  14. Comparative Study of Bancruptcy Prediction Models

    Directory of Open Access Journals (Sweden)

    Isye Arieshanti

    2013-09-01

    Full Text Available Early indication of bancruptcy is important for a company. If companies aware of  potency of their bancruptcy, they can take a preventive action to anticipate the bancruptcy. In order to detect the potency of a bancruptcy, a company can utilize a a model of bancruptcy prediction. The prediction model can be built using a machine learning methods. However, the choice of machine learning methods should be performed carefully. Because the suitability of a model depends on the problem specifically. Therefore, in this paper we perform a comparative study of several machine leaning methods for bancruptcy prediction. According to the comparative study, the performance of several models that based on machine learning methods (k-NN, fuzzy k-NN, SVM, Bagging Nearest Neighbour SVM, Multilayer Perceptron(MLP, Hybrid of MLP + Multiple Linear Regression, it can be showed that fuzzy k-NN method achieve the best performance with accuracy 77.5%

  15. Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments

    Energy Technology Data Exchange (ETDEWEB)

    Haas, B J; Salzberg, S L; Zhu, W; Pertea, M; Allen, J E; Orvis, J; White, O; Buell, C R; Wortman, J R

    2007-12-10

    EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

  16. Prediction Models for Dynamic Demand Response

    Energy Technology Data Exchange (ETDEWEB)

    Aman, Saima; Frincu, Marc; Chelmis, Charalampos; Noor, Muhammad; Simmhan, Yogesh; Prasanna, Viktor K.

    2015-11-02

    As Smart Grids move closer to dynamic curtailment programs, Demand Response (DR) events will become necessary not only on fixed time intervals and weekdays predetermined by static policies, but also during changing decision periods and weekends to react to real-time demand signals. Unique challenges arise in this context vis-a-vis demand prediction and curtailment estimation and the transformation of such tasks into an automated, efficient dynamic demand response (D2R) process. While existing work has concentrated on increasing the accuracy of prediction models for DR, there is a lack of studies for prediction models for D2R, which we address in this paper. Our first contribution is the formal definition of D2R, and the description of its challenges and requirements. Our second contribution is a feasibility analysis of very-short-term prediction of electricity consumption for D2R over a diverse, large-scale dataset that includes both small residential customers and large buildings. Our third, and major contribution is a set of insights into the predictability of electricity consumption in the context of D2R. Specifically, we focus on prediction models that can operate at a very small data granularity (here 15-min intervals), for both weekdays and weekends - all conditions that characterize scenarios for D2R. We find that short-term time series and simple averaging models used by Independent Service Operators and utilities achieve superior prediction accuracy. We also observe that workdays are more predictable than weekends and holiday. Also, smaller customers have large variation in consumption and are less predictable than larger buildings. Key implications of our findings are that better models are required for small customers and for non-workdays, both of which are critical for D2R. Also, prediction models require just few days’ worth of data indicating that small amounts of

  17. Evaluation of CASP8 model quality predictions

    KAUST Repository

    Cozzetto, Domenico

    2009-01-01

    The model quality assessment problem consists in the a priori estimation of the overall and per-residue accuracy of protein structure predictions. Over the past years, a number of methods have been developed to address this issue and CASP established a prediction category to evaluate their performance in 2006. In 2008 the experiment was repeated and its results are reported here. Participants were invited to infer the correctness of the protein models submitted by the registered automatic servers. Estimates could apply to both whole models and individual amino acids. Groups involved in the tertiary structure prediction categories were also asked to assign local error estimates to each predicted residue in their own models and their results are also discussed here. The correlation between the predicted and observed correctness measures was the basis of the assessment of the results. We observe that consensus-based methods still perform significantly better than those accepting single models, similarly to what was concluded in the previous edition of the experiment. © 2009 WILEY-LISS, INC.

  18. Finding Furfural Hydrogenation Catalysts via Predictive Modelling

    Science.gov (United States)

    Strassberger, Zea; Mooijman, Maurice; Ruijter, Eelco; Alberts, Albert H; Maldonado, Ana G; Orru, Romano V A; Rothenberg, Gadi

    2010-01-01

    Abstract We combine multicomponent reactions, catalytic performance studies and predictive modelling to find transfer hydrogenation catalysts. An initial set of 18 ruthenium-carbene complexes were synthesized and screened in the transfer hydrogenation of furfural to furfurol with isopropyl alcohol complexes gave varied yields, from 62% up to >99.9%, with no obvious structure/activity correlations. Control experiments proved that the carbene ligand remains coordinated to the ruthenium centre throughout the reaction. Deuterium-labelling studies showed a secondary isotope effect (kH:kD=1.5). Further mechanistic studies showed that this transfer hydrogenation follows the so-called monohydride pathway. Using these data, we built a predictive model for 13 of the catalysts, based on 2D and 3D molecular descriptors. We tested and validated the model using the remaining five catalysts (cross-validation, R2=0.913). Then, with this model, the conversion and selectivity were predicted for four completely new ruthenium-carbene complexes. These four catalysts were then synthesized and tested. The results were within 3% of the model’s predictions, demonstrating the validity and value of predictive modelling in catalyst optimization. PMID:23193388

  19. Pairagon+N-SCAN_EST: a model-based gene annotation pipeline

    DEFF Research Database (Denmark)

    Arumugam, Manimozhiyan; Wei, Chaochun; Brown, Randall H

    2006-01-01

    This paper describes Pairagon+N-SCAN_EST, a gene annotation pipeline that uses only native alignments. For each expressed sequence it chooses the best genomic alignment. Systems like ENSEMBL and ExoGean rely on trans alignments, in which expressed sequences are aligned to the genomic loci...... with de novo gene prediction by using N-SCAN_EST. N-SCAN_EST is based on a generalized HMM probability model augmented with a phylogenetic conservation model and EST alignments. It can predict complete transcripts by extending or merging EST alignments, but it can also predict genes in regions without EST...

  20. Wind farm production prediction - The Zephyr model

    Energy Technology Data Exchange (ETDEWEB)

    Landberg, L. [Risoe National Lab., Wind Energy Dept., Roskilde (Denmark); Giebel, G. [Risoe National Lab., Wind Energy Dept., Roskilde (Denmark); Madsen, H. [IMM (DTU), Kgs. Lyngby (Denmark); Nielsen, T.S. [IMM (DTU), Kgs. Lyngby (Denmark); Joergensen, J.U. [Danish Meteorologisk Inst., Copenhagen (Denmark); Lauersen, L. [Danish Meteorologisk Inst., Copenhagen (Denmark); Toefting, J. [Elsam, Fredericia (DK); Christensen, H.S. [Eltra, Fredericia (Denmark); Bjerge, C. [SEAS, Haslev (Denmark)

    2002-06-01

    This report describes a project - funded by the Danish Ministry of Energy and the Environment - which developed a next generation prediction system called Zephyr. The Zephyr system is a merging between two state-of-the-art prediction systems: Prediktor of Risoe National Laboratory and WPPT of IMM at the Danish Technical University. The numerical weather predictions were generated by DMI's HIRLAM model. Due to technical difficulties programming the system, only the computational core and a very simple version of the originally very complex system were developed. The project partners were: Risoe, DMU, DMI, Elsam, Eltra, Elkraft System, SEAS and E2. (au)

  1. MOCAT: a metagenomics assembly and gene prediction toolkit.

    Science.gov (United States)

    Kultima, Jens Roat; Sunagawa, Shinichi; Li, Junhua; Chen, Weineng; Chen, Hua; Mende, Daniel R; Arumugam, Manimozhiyan; Pan, Qi; Liu, Binghang; Qin, Junjie; Wang, Jun; Bork, Peer

    2012-01-01

    MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.

  2. MOCAT: a metagenomics assembly and gene prediction toolkit.

    Directory of Open Access Journals (Sweden)

    Jens Roat Kultima

    Full Text Available MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.

  3. Model predictive controller design of hydrocracker reactors

    OpenAIRE

    GÖKÇE, Dila

    2011-01-01

    This study summarizes the design of a Model Predictive Controller (MPC) in Tüpraş, İzmit Refinery Hydrocracker Unit Reactors. Hydrocracking process, in which heavy vacuum gasoil is converted into lighter and valuable products at high temperature and pressure is described briefly. Controller design description, identification and modeling studies are examined and the model variables are presented. WABT (Weighted Average Bed Temperature) equalization and conversion increase are simulate...

  4. Sparsity in Model Gene Regulatory Networks

    International Nuclear Information System (INIS)

    Zagorski, M.

    2011-01-01

    We propose a gene regulatory network model which incorporates the microscopic interactions between genes and transcription factors. In particular the gene's expression level is determined by deterministic synchronous dynamics with contribution from excitatory interactions. We study the structure of networks that have a particular '' function '' and are subject to the natural selection pressure. The question of network robustness against point mutations is addressed, and we conclude that only a small part of connections defined as '' essential '' for cell's existence is fragile. Additionally, the obtained networks are sparse with narrow in-degree and broad out-degree, properties well known from experimental study of biological regulatory networks. Furthermore, during sampling procedure we observe that significantly different genotypes can emerge under mutation-selection balance. All the preceding features hold for the model parameters which lay in the experimentally relevant range. (author)

  5. Multi-Model Ensemble Wake Vortex Prediction

    Science.gov (United States)

    Koerner, Stephan; Holzaepfel, Frank; Ahmad, Nash'at N.

    2015-01-01

    Several multi-model ensemble methods are investigated for predicting wake vortex transport and decay. This study is a joint effort between National Aeronautics and Space Administration and Deutsches Zentrum fuer Luft- und Raumfahrt to develop a multi-model ensemble capability using their wake models. An overview of different multi-model ensemble methods and their feasibility for wake applications is presented. The methods include Reliability Ensemble Averaging, Bayesian Model Averaging, and Monte Carlo Simulations. The methodologies are evaluated using data from wake vortex field experiments.

  6. CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation.

    Science.gov (United States)

    Nikulova, Anna A; Favorov, Alexander V; Sutormin, Roman A; Makeev, Vsevolod J; Mironov, Andrey A

    2012-07-01

    Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.

  7. Risk terrain modeling predicts child maltreatment.

    Science.gov (United States)

    Daley, Dyann; Bachmann, Michael; Bachmann, Brittany A; Pedigo, Christian; Bui, Minh-Thuy; Coffman, Jamye

    2016-12-01

    As indicated by research on the long-term effects of adverse childhood experiences (ACEs), maltreatment has far-reaching consequences for affected children. Effective prevention measures have been elusive, partly due to difficulty in identifying vulnerable children before they are harmed. This study employs Risk Terrain Modeling (RTM), an analysis of the cumulative effect of environmental factors thought to be conducive for child maltreatment, to create a highly accurate prediction model for future substantiated child maltreatment cases in the City of Fort Worth, Texas. The model is superior to commonly used hotspot predictions and more beneficial in aiding prevention efforts in a number of ways: 1) it identifies the highest risk areas for future instances of child maltreatment with improved precision and accuracy; 2) it aids the prioritization of risk-mitigating efforts by informing about the relative importance of the most significant contributing risk factors; 3) since predictions are modeled as a function of easily obtainable data, practitioners do not have to undergo the difficult process of obtaining official child maltreatment data to apply it; 4) the inclusion of a multitude of environmental risk factors creates a more robust model with higher predictive validity; and, 5) the model does not rely on a retrospective examination of past instances of child maltreatment, but adapts predictions to changing environmental conditions. The present study introduces and examines the predictive power of this new tool to aid prevention efforts seeking to improve the safety, health, and wellbeing of vulnerable children. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  8. Computational challenges in modeling gene regulatory events.

    Science.gov (United States)

    Pataskar, Abhijeet; Tiwari, Vijay K

    2016-10-19

    Cellular transcriptional programs driven by genetic and epigenetic mechanisms could be better understood by integrating "omics" data and subsequently modeling the gene-regulatory events. Toward this end, computational biology should keep pace with evolving experimental procedures and data availability. This article gives an exemplified account of the current computational challenges in molecular biology.

  9. Estimating confidence intervals in predicted responses for oscillatory biological models.

    Science.gov (United States)

    St John, Peter C; Doyle, Francis J

    2013-07-29

    The dynamics of gene regulation play a crucial role in a cellular control: allowing the cell to express the right proteins to meet changing needs. Some needs, such as correctly anticipating the day-night cycle, require complicated oscillatory features. In the analysis of gene regulatory networks, mathematical models are frequently used to understand how a network's structure enables it to respond appropriately to external inputs. These models typically consist of a set of ordinary differential equations, describing a network of biochemical reactions, and unknown kinetic parameters, chosen such that the model best captures experimental data. However, since a model's parameter values are uncertain, and since dynamic responses to inputs are highly parameter-dependent, it is difficult to assess the confidence associated with these in silico predictions. In particular, models with complex dynamics - such as oscillations - must be fit with computationally expensive global optimization routines, and cannot take advantage of existing measures of identifiability. Despite their difficulty to model mathematically, limit cycle oscillations play a key role in many biological processes, including cell cycling, metabolism, neuron firing, and circadian rhythms. In this study, we employ an efficient parameter estimation technique to enable a bootstrap uncertainty analysis for limit cycle models. Since the primary role of systems biology models is the insight they provide on responses to rate perturbations, we extend our uncertainty analysis to include first order sensitivity coefficients. Using a literature model of circadian rhythms, we show how predictive precision is degraded with decreasing sample points and increasing relative error. Additionally, we show how this method can be used for model discrimination by comparing the output identifiability of two candidate model structures to published literature data. Our method permits modellers of oscillatory systems to confidently

  10. PREDICTIVE CAPACITY OF ARCH FAMILY MODELS

    Directory of Open Access Journals (Sweden)

    Raphael Silveira Amaro

    2016-03-01

    Full Text Available In the last decades, a remarkable number of models, variants from the Autoregressive Conditional Heteroscedastic family, have been developed and empirically tested, making extremely complex the process of choosing a particular model. This research aim to compare the predictive capacity, using the Model Confidence Set procedure, than five conditional heteroskedasticity models, considering eight different statistical probability distributions. The financial series which were used refers to the log-return series of the Bovespa index and the Dow Jones Industrial Index in the period between 27 October 2008 and 30 December 2014. The empirical evidences showed that, in general, competing models have a great homogeneity to make predictions, either for a stock market of a developed country or for a stock market of a developing country. An equivalent result can be inferred for the statistical probability distributions that were used.

  11. Alcator C-Mod predictive modeling

    International Nuclear Information System (INIS)

    Pankin, Alexei; Bateman, Glenn; Kritz, Arnold; Greenwald, Martin; Snipes, Joseph; Fredian, Thomas

    2001-01-01

    Predictive simulations for the Alcator C-mod tokamak [I. Hutchinson et al., Phys. Plasmas 1, 1511 (1994)] are carried out using the BALDUR integrated modeling code [C. E. Singer et al., Comput. Phys. Commun. 49, 275 (1988)]. The results are obtained for temperature and density profiles using the Multi-Mode transport model [G. Bateman et al., Phys. Plasmas 5, 1793 (1998)] as well as the mixed-Bohm/gyro-Bohm transport model [M. Erba et al., Plasma Phys. Controlled Fusion 39, 261 (1997)]. The simulated discharges are characterized by very high plasma density in both low and high modes of confinement. The predicted profiles for each of the transport models match the experimental data about equally well in spite of the fact that the two models have different dimensionless scalings. Average relative rms deviations are less than 8% for the electron density profiles and 16% for the electron and ion temperature profiles

  12. Simple mathematical models of gene regulatory dynamics

    CERN Document Server

    Mackey, Michael C; Tyran-Kamińska, Marta; Zeron, Eduardo S

    2016-01-01

    This is a short and self-contained introduction to the field of mathematical modeling of gene-networks in bacteria. As an entry point to the field, we focus on the analysis of simple gene-network dynamics. The notes commence with an introduction to the deterministic modeling of gene-networks, with extensive reference to applicable results coming from dynamical systems theory. The second part of the notes treats extensively several approaches to the study of gene-network dynamics in the presence of noise—either arising from low numbers of molecules involved, or due to noise external to the regulatory process. The third and final part of the notes gives a detailed treatment of three well studied and concrete examples of gene-network dynamics by considering the lactose operon, the tryptophan operon, and the lysis-lysogeny switch. The notes contain an index for easy location of particular topics as well as an extensive bibliography of the current literature. The target audience of these notes are mainly graduat...

  13. A Computational Gene Expression Score for Predicting Immune Injury in Renal Allografts.

    Directory of Open Access Journals (Sweden)

    Tara K Sigdel

    Full Text Available Whole genome microarray meta-analyses of 1030 kidney, heart, lung and liver allograft biopsies identified a common immune response module (CRM of 11 genes that define acute rejection (AR across different engrafted tissues. We evaluated if the CRM genes can provide a molecular microscope to quantify graft injury in acute rejection (AR and predict risk of progressive interstitial fibrosis and tubular atrophy (IFTA in histologically normal kidney biopsies.Computational modeling was done on tissue qPCR based gene expression measurements for the 11 CRM genes in 146 independent renal allografts from 122 unique patients with AR (n = 54 and no-AR (n = 92. 24 demographically matched patients with no-AR had 6 and 24 month paired protocol biopsies; all had histologically normal 6 month biopsies, and 12 had evidence of progressive IFTA (pIFTA on their 24 month biopsies. Results were correlated with demographic, clinical and pathology variables.The 11 gene qPCR based tissue CRM score (tCRM was significantly increased in AR (5.68 ± 0.91 when compared to STA (1.29 ± 0.28; p < 0.001 and pIFTA (7.94 ± 2.278 versus 2.28 ± 0.66; p = 0.04, with greatest significance for CXCL9 and CXCL10 in AR (p <0.001 and CD6 (p<0.01, CXCL9 (p<0.05, and LCK (p<0.01 in pIFTA. tCRM was a significant independent correlate of biopsy confirmed AR (p < 0.001; AUC of 0.900; 95% CI = 0.705-903. Gene expression modeling of 6 month biopsies across 7/11 genes (CD6, INPP5D, ISG20, NKG7, PSMB9, RUNX3, and TAP1 significantly (p = 0.037 predicted the development of pIFTA at 24 months.Genome-wide tissue gene expression data mining has supported the development of a tCRM-qPCR based assay for evaluating graft immune inflammation. The tCRM score quantifies injury in AR and stratifies patients at increased risk of future pIFTA prior to any perturbation of graft function or histology.

  14. Predicting gene function using hierarchical multi-label decision tree ensembles

    Directory of Open Access Journals (Sweden)

    Kocev Dragi

    2010-01-01

    Full Text Available Abstract Background S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. Results We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO. We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. Conclusions Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.

  15. Modelling the predictive performance of credit scoring

    Directory of Open Access Journals (Sweden)

    Shi-Wei Shen

    2013-07-01

    Research purpose: The purpose of this empirical paper was to examine the predictive performance of credit scoring systems in Taiwan. Motivation for the study: Corporate lending remains a major business line for financial institutions. However, in light of the recent global financial crises, it has become extremely important for financial institutions to implement rigorous means of assessing clients seeking access to credit facilities. Research design, approach and method: Using a data sample of 10 349 observations drawn between 1992 and 2010, logistic regression models were utilised to examine the predictive performance of credit scoring systems. Main findings: A test of Goodness of fit demonstrated that credit scoring models that incorporated the Taiwan Corporate Credit Risk Index (TCRI, micro- and also macroeconomic variables possessed greater predictive power. This suggests that macroeconomic variables do have explanatory power for default credit risk. Practical/managerial implications: The originality in the study was that three models were developed to predict corporate firms’ defaults based on different microeconomic and macroeconomic factors such as the TCRI, asset growth rates, stock index and gross domestic product. Contribution/value-add: The study utilises different goodness of fits and receiver operator characteristics during the examination of the robustness of the predictive power of these factors.

  16. Sequence-based model of gap gene regulatory network.

    Science.gov (United States)

    Kozlov, Konstantin; Gursky, Vitaly; Kulakovskiy, Ivan; Samsonova, Maria

    2014-01-01

    The detailed analysis of transcriptional regulation is crucially important for understanding biological processes. The gap gene network in Drosophila attracts large interest among researches studying mechanisms of transcriptional regulation. It implements the most upstream regulatory layer of the segmentation gene network. The knowledge of molecular mechanisms involved in gap gene regulation is far less complete than that of genetics of the system. Mathematical modeling goes beyond insights gained by genetics and molecular approaches. It allows us to reconstruct wild-type gene expression patterns in silico, infer underlying regulatory mechanism and prove its sufficiency. We developed a new model that provides a dynamical description of gap gene regulatory systems, using detailed DNA-based information, as well as spatial transcription factor concentration data at varying time points. We showed that this model correctly reproduces gap gene expression patterns in wild type embryos and is able to predict gap expression patterns in Kr mutants and four reporter constructs. We used four-fold cross validation test and fitting to random dataset to validate the model and proof its sufficiency in data description. The identifiability analysis showed that most model parameters are well identifiable. We reconstructed the gap gene network topology and studied the impact of individual transcription factor binding sites on the model output. We measured this impact by calculating the site regulatory weight as a normalized difference between the residual sum of squares error for the set of all annotated sites and for the set with the site of interest excluded. The reconstructed topology of the gap gene network is in agreement with previous modeling results and data from literature. We showed that 1) the regulatory weights of transcription factor binding sites show very weak correlation with their PWM score; 2) sites with low regulatory weight are important for the model output; 3

  17. Comparison of two ordinal prediction models

    DEFF Research Database (Denmark)

    Kattan, Michael W; Gerds, Thomas A

    2015-01-01

    system (i.e. old or new), such as the level of evidence for one or more factors included in the system or the general opinions of expert clinicians. However, given the major objective of estimating prognosis on an ordinal scale, we argue that the rival staging system candidates should be compared...... on their ability to predict outcome. We sought to outline an algorithm that would compare two rival ordinal systems on their predictive ability. RESULTS: We devised an algorithm based largely on the concordance index, which is appropriate for comparing two models in their ability to rank observations. We...... demonstrate our algorithm with a prostate cancer staging system example. CONCLUSION: We have provided an algorithm for selecting the preferred staging system based on prognostic accuracy. It appears to be useful for the purpose of selecting between two ordinal prediction models....

  18. Predictive analytics can support the ACO model.

    Science.gov (United States)

    Bradley, Paul

    2012-04-01

    Predictive analytics can be used to rapidly spot hard-to-identify opportunities to better manage care--a key tool in accountable care. When considering analytics models, healthcare providers should: Make value-based care a priority and act on information from analytics models. Create a road map that includes achievable steps, rather than major endeavors. Set long-term expectations and recognize that the effectiveness of an analytics program takes time, unlike revenue cycle initiatives that may show a quick return.

  19. Predictive performance models and multiple task performance

    Science.gov (United States)

    Wickens, Christopher D.; Larish, Inge; Contorer, Aaron

    1989-01-01

    Five models that predict how performance of multiple tasks will interact in complex task scenarios are discussed. The models are shown in terms of the assumptions they make about human operator divided attention. The different assumptions about attention are then empirically validated in a multitask helicopter flight simulation. It is concluded from this simulation that the most important assumption relates to the coding of demand level of different component tasks.

  20. Model Predictive Control of Sewer Networks

    DEFF Research Database (Denmark)

    Pedersen, Einar B.; Herbertsson, Hannes R.; Niemann, Henrik

    2016-01-01

    The developments in solutions for management of urban drainage are of vital importance, as the amount of sewer water from urban areas continues to increase due to the increase of the world’s population and the change in the climate conditions. How a sewer network is structured, monitored and cont...... benchmark model. Due to the inherent constraints the applied approach is based on Model Predictive Control....

  1. Distributed Model Predictive Control via Dual Decomposition

    DEFF Research Database (Denmark)

    Biegel, Benjamin; Stoustrup, Jakob; Andersen, Palle

    2014-01-01

    This chapter presents dual decomposition as a means to coordinate a number of subsystems coupled by state and input constraints. Each subsystem is equipped with a local model predictive controller while a centralized entity manages the subsystems via prices associated with the coupling constraints...

  2. A quantitative and dynamic model of the Arabidopsis flowering time gene regulatory network.

    Directory of Open Access Journals (Sweden)

    Felipe Leal Valentim

    Full Text Available Various environmental signals integrate into a network of floral regulatory genes leading to the final decision on when to flower. Although a wealth of qualitative knowledge is available on how flowering time genes regulate each other, only a few studies incorporated this knowledge into predictive models. Such models are invaluable as they enable to investigate how various types of inputs are combined to give a quantitative readout. To investigate the effect of gene expression disturbances on flowering time, we developed a dynamic model for the regulation of flowering time in Arabidopsis thaliana. Model parameters were estimated based on expression time-courses for relevant genes, and a consistent set of flowering times for plants of various genetic backgrounds. Validation was performed by predicting changes in expression level in mutant backgrounds and comparing these predictions with independent expression data, and by comparison of predicted and experimental flowering times for several double mutants. Remarkably, the model predicts that a disturbance in a particular gene has not necessarily the largest impact on directly connected genes. For example, the model predicts that SUPPRESSOR OF OVEREXPRESSION OF CONSTANS (SOC1 mutation has a larger impact on APETALA1 (AP1, which is not directly regulated by SOC1, compared to its effect on LEAFY (LFY which is under direct control of SOC1. This was confirmed by expression data. Another model prediction involves the importance of cooperativity in the regulation of APETALA1 (AP1 by LFY, a prediction supported by experimental evidence. Concluding, our model for flowering time gene regulation enables to address how different quantitative inputs are combined into one quantitative output, flowering time.

  3. Combinatorial explosion in model gene networks

    Science.gov (United States)

    Edwards, R.; Glass, L.

    2000-09-01

    The explosive growth in knowledge of the genome of humans and other organisms leaves open the question of how the functioning of genes in interacting networks is coordinated for orderly activity. One approach to this problem is to study mathematical properties of abstract network models that capture the logical structures of gene networks. The principal issue is to understand how particular patterns of activity can result from particular network structures, and what types of behavior are possible. We study idealized models in which the logical structure of the network is explicitly represented by Boolean functions that can be represented by directed graphs on n-cubes, but which are continuous in time and described by differential equations, rather than being updated synchronously via a discrete clock. The equations are piecewise linear, which allows significant analysis and facilitates rapid integration along trajectories. We first give a combinatorial solution to the question of how many distinct logical structures exist for n-dimensional networks, showing that the number increases very rapidly with n. We then outline analytic methods that can be used to establish the existence, stability and periods of periodic orbits corresponding to particular cycles on the n-cube. We use these methods to confirm the existence of limit cycles discovered in a sample of a million randomly generated structures of networks of 4 genes. Even with only 4 genes, at least several hundred different patterns of stable periodic behavior are possible, many of them surprisingly complex. We discuss ways of further classifying these periodic behaviors, showing that small mutations (reversal of one or a few edges on the n-cube) need not destroy the stability of a limit cycle. Although these networks are very simple as models of gene networks, their mathematical transparency reveals relationships between structure and behavior, they suggest that the possibilities for orderly dynamics in such

  4. De novo transcriptome sequencing and digital gene expression analysis predict biosynthetic pathway of rhynchophylline and isorhynchophylline from Uncaria rhynchophylla, a non-model plant with potent anti-alzheimer's properties.

    Science.gov (United States)

    Guo, Qianqian; Ma, Xiaojun; Wei, Shugen; Qiu, Deyou; Wilson, Iain W; Wu, Peng; Tang, Qi; Liu, Lijun; Dong, Shoukun; Zu, Wei

    2014-08-12

    The major medicinal alkaloids isolated from Uncaria rhynchophylla (gouteng in chinese) capsules are rhynchophylline (RIN) and isorhynchophylline (IRN). Extracts containing these terpene indole alkaloids (TIAs) can inhibit the formation and destabilize preformed fibrils of amyloid β protein (a pathological marker of Alzheimer's disease), and have been shown to improve the cognitive function of mice with Alzheimer-like symptoms. The biosynthetic pathways of RIN and IRN are largely unknown. In this study, RNA-sequencing of pooled Uncaria capsules RNA samples taken at three developmental stages that accumulate different amount of RIN and IRN was performed. More than 50 million high-quality reads from a cDNA library were generated and de novo assembled. Sequences for all of the known enzymes involved in TIAs synthesis were identified. Additionally, 193 cytochrome P450 (CYP450), 280 methyltransferase and 144 isomerase genes were identified, that are potential candidates for enzymes involved in RIN and IRN synthesis. Digital gene expression profile (DGE) analysis was performed on the three capsule developmental stages, and based on genes possessing expression profiles consistent with RIN and IRN levels; four CYP450s, three methyltransferases and three isomerases were identified as the candidates most likely to be involved in the later steps of RIN and IRN biosynthesis. A combination of de novo transcriptome assembly and DGE analysis was shown to be a powerful method for identifying genes encoding enzymes potentially involved in the biosynthesis of important secondary metabolites in a non-model plant. The transcriptome data from this study provides an important resource for understanding the formation of major bioactive constituents in the capsule extract from Uncaria, and provides information that may aid in metabolic engineering to increase yields of these important alkaloids.

  5. Dichotomous noise models of gene switches

    Energy Technology Data Exchange (ETDEWEB)

    Potoyan, Davit A., E-mail: potoyan@rice.edu; Wolynes, Peter G., E-mail: pwolynes@rice.edu [Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005 (United States)

    2015-11-21

    Molecular noise in gene regulatory networks has two intrinsic components, one part being due to fluctuations caused by the birth and death of protein or mRNA molecules which are often present in small numbers and the other part arising from gene state switching, a single molecule event. Stochastic dynamics of gene regulatory circuits appears to be largely responsible for bifurcations into a set of multi-attractor states that encode different cell phenotypes. The interplay of dichotomous single molecule gene noise with the nonlinear architecture of genetic networks generates rich and complex phenomena. In this paper, we elaborate on an approximate framework that leads to simple hybrid multi-scale schemes well suited for the quantitative exploration of the steady state properties of large-scale cellular genetic circuits. Through a path sum based analysis of trajectory statistics, we elucidate the connection of these hybrid schemes to the underlying master equation and provide a rigorous justification for using dichotomous noise based models to study genetic networks. Numerical simulations of circuit models reveal that the contribution of the genetic noise of single molecule origin to the total noise is significant for a wide range of kinetic regimes.

  6. Electrostatic ion thrusters - towards predictive modeling

    Energy Technology Data Exchange (ETDEWEB)

    Kalentev, O.; Matyash, K.; Duras, J.; Lueskow, K.F.; Schneider, R. [Ernst-Moritz-Arndt Universitaet Greifswald, D-17489 (Germany); Koch, N. [Technische Hochschule Nuernberg Georg Simon Ohm, Kesslerplatz 12, D-90489 Nuernberg (Germany); Schirra, M. [Thales Electronic Systems GmbH, Soeflinger Strasse 100, D-89077 Ulm (Germany)

    2014-02-15

    The development of electrostatic ion thrusters so far has mainly been based on empirical and qualitative know-how, and on evolutionary iteration steps. This resulted in considerable effort regarding prototype design, construction and testing and therefore in significant development and qualification costs and high time demands. For future developments it is anticipated to implement simulation tools which allow for quantitative prediction of ion thruster performance, long-term behavior and space craft interaction prior to hardware design and construction. Based on integrated numerical models combining self-consistent kinetic plasma models with plasma-wall interaction modules a new quality in the description of electrostatic thrusters can be reached. These open the perspective for predictive modeling in this field. This paper reviews the application of a set of predictive numerical modeling tools on an ion thruster model of the HEMP-T (High Efficiency Multi-stage Plasma Thruster) type patented by Thales Electron Devices GmbH. (copyright 2014 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim) (orig.)

  7. An Intelligent Model for Stock Market Prediction

    Directory of Open Access Journals (Sweden)

    IbrahimM. Hamed

    2012-08-01

    Full Text Available This paper presents an intelligent model for stock market signal prediction using Multi-Layer Perceptron (MLP Artificial Neural Networks (ANN. Blind source separation technique, from signal processing, is integrated with the learning phase of the constructed baseline MLP ANN to overcome the problems of prediction accuracy and lack of generalization. Kullback Leibler Divergence (KLD is used, as a learning algorithm, because it converges fast and provides generalization in the learning mechanism. Both accuracy and efficiency of the proposed model were confirmed through the Microsoft stock, from wall-street market, and various data sets, from different sectors of the Egyptian stock market. In addition, sensitivity analysis was conducted on the various parameters of the model to ensure the coverage of the generalization issue. Finally, statistical significance was examined using ANOVA test.

  8. Predictive Models, How good are they?

    DEFF Research Database (Denmark)

    Kasch, Helge

    The WAD grading system has been used for more than 20 years by now. It has shown long-term viability, but with strengths and limitations. New bio-psychosocial assessment of the acute whiplash injured subject may provide better prediction of long-term disability and pain. Furthermore, the emerging......-up. It is important to obtain prospective identification of the relevant risk underreported disability could, if we were able to expose these hidden “risk-factors” during our consultations, provide us with better predictive models. New data from large clinical studies will present exciting new genetic risk markers...

  9. NONLINEAR MODEL PREDICTIVE CONTROL OF CHEMICAL PROCESSES

    Directory of Open Access Journals (Sweden)

    SILVA R. G.

    1999-01-01

    Full Text Available A new algorithm for model predictive control is presented. The algorithm utilizes a simultaneous solution and optimization strategy to solve the model's differential equations. The equations are discretized by equidistant collocation, and along with the algebraic model equations are included as constraints in a nonlinear programming (NLP problem. This algorithm is compared with the algorithm that uses orthogonal collocation on finite elements. The equidistant collocation algorithm results in simpler equations, providing a decrease in computation time for the control moves. Simulation results are presented and show a satisfactory performance of this algorithm.

  10. A statistical model for predicting muscle performance

    Science.gov (United States)

    Byerly, Diane Leslie De Caix

    The objective of these studies was to develop a capability for predicting muscle performance and fatigue to be utilized for both space- and ground-based applications. To develop this predictive model, healthy test subjects performed a defined, repetitive dynamic exercise to failure using a Lordex spinal machine. Throughout the exercise, surface electromyography (SEMG) data were collected from the erector spinae using a Mega Electronics ME3000 muscle tester and surface electrodes placed on both sides of the back muscle. These data were analyzed using a 5th order Autoregressive (AR) model and statistical regression analysis. It was determined that an AR derived parameter, the mean average magnitude of AR poles, significantly correlated with the maximum number of repetitions (designated Rmax) that a test subject was able to perform. Using the mean average magnitude of AR poles, a test subject's performance to failure could be predicted as early as the sixth repetition of the exercise. This predictive model has the potential to provide a basis for improving post-space flight recovery, monitoring muscle atrophy in astronauts and assessing the effectiveness of countermeasures, monitoring astronaut performance and fatigue during Extravehicular Activity (EVA) operations, providing pre-flight assessment of the ability of an EVA crewmember to perform a given task, improving the design of training protocols and simulations for strenuous International Space Station assembly EVA, and enabling EVA work task sequences to be planned enhancing astronaut performance and safety. Potential ground-based, medical applications of the predictive model include monitoring muscle deterioration and performance resulting from illness, establishing safety guidelines in the industry for repetitive tasks, monitoring the stages of rehabilitation for muscle-related injuries sustained in sports and accidents, and enhancing athletic performance through improved training protocols while reducing

  11. Kinetic models of gene expression including non-coding RNAs

    Energy Technology Data Exchange (ETDEWEB)

    Zhdanov, Vladimir P., E-mail: zhdanov@catalysis.r

    2011-03-15

    In cells, genes are transcribed into mRNAs, and the latter are translated into proteins. Due to the feedbacks between these processes, the kinetics of gene expression may be complex even in the simplest genetic networks. The corresponding models have already been reviewed in the literature. A new avenue in this field is related to the recognition that the conventional scenario of gene expression is fully applicable only to prokaryotes whose genomes consist of tightly packed protein-coding sequences. In eukaryotic cells, in contrast, such sequences are relatively rare, and the rest of the genome includes numerous transcript units representing non-coding RNAs (ncRNAs). During the past decade, it has become clear that such RNAs play a crucial role in gene expression and accordingly influence a multitude of cellular processes both in the normal state and during diseases. The numerous biological functions of ncRNAs are based primarily on their abilities to silence genes via pairing with a target mRNA and subsequently preventing its translation or facilitating degradation of the mRNA-ncRNA complex. Many other abilities of ncRNAs have been discovered as well. Our review is focused on the available kinetic models describing the mRNA, ncRNA and protein interplay. In particular, we systematically present the simplest models without kinetic feedbacks, models containing feedbacks and predicting bistability and oscillations in simple genetic networks, and models describing the effect of ncRNAs on complex genetic networks. Mathematically, the presentation is based primarily on temporal mean-field kinetic equations. The stochastic and spatio-temporal effects are also briefly discussed.

  12. Prediction models : the right tool for the right problem

    NARCIS (Netherlands)

    Kappen, Teus H.; Peelen, Linda M.

    2016-01-01

    PURPOSE OF REVIEW: Perioperative prediction models can help to improve personalized patient care by providing individual risk predictions to both patients and providers. However, the scientific literature on prediction model development and validation can be quite technical and challenging to

  13. A polynomial based model for cell fate prediction in human diseases.

    Science.gov (United States)

    Ma, Lichun; Zheng, Jie

    2017-12-21

    Cell fate regulation directly affects tissue homeostasis and human health. Research on cell fate decision sheds light on key regulators, facilitates understanding the mechanisms, and suggests novel strategies to treat human diseases that are related to abnormal cell development. In this study, we proposed a polynomial based model to predict cell fate. This model was derived from Taylor series. As a case study, gene expression data of pancreatic cells were adopted to test and verify the model. As numerous features (genes) are available, we employed two kinds of feature selection methods, i.e. correlation based and apoptosis pathway based. Then polynomials of different degrees were used to refine the cell fate prediction function. 10-fold cross-validation was carried out to evaluate the performance of our model. In addition, we analyzed the stability of the resultant cell fate prediction model by evaluating the ranges of the parameters, as well as assessing the variances of the predicted values at randomly selected points. Results show that, within both the two considered gene selection methods, the prediction accuracies of polynomials of different degrees show little differences. Interestingly, the linear polynomial (degree 1 polynomial) is more stable than others. When comparing the linear polynomials based on the two gene selection methods, it shows that although the accuracy of the linear polynomial that uses correlation analysis outcomes is a little higher (achieves 86.62%), the one within genes of the apoptosis pathway is much more stable. Considering both the prediction accuracy and the stability of polynomial models of different degrees, the linear model is a preferred choice for cell fate prediction with gene expression data of pancreatic cells. The presented cell fate prediction model can be extended to other cells, which may be important for basic research as well as clinical study of cell development related diseases.

  14. Neuro-fuzzy modeling in bankruptcy prediction

    Directory of Open Access Journals (Sweden)

    Vlachos D.

    2003-01-01

    Full Text Available For the past 30 years the problem of bankruptcy prediction had been thoroughly studied. From the paper of Altman in 1968 to the recent papers in the '90s, the progress of prediction accuracy was not satisfactory. This paper investigates an alternative modeling of the system (firm, combining neural networks and fuzzy controllers, i.e. using neuro-fuzzy models. Classical modeling is based on mathematical models that describe the behavior of the firm under consideration. The main idea of fuzzy control, on the other hand, is to build a model of a human control expert who is capable of controlling the process without thinking in a mathematical model. This control expert specifies his control action in the form of linguistic rules. These control rules are translated into the framework of fuzzy set theory providing a calculus, which can stimulate the behavior of the control expert and enhance its performance. The accuracy of the model is studied using datasets from previous research papers.

  15. Characterizing haploinsufficiency of SHELL gene to improve fruit form prediction in introgressive hybrids of oil palm.

    Science.gov (United States)

    Teh, Chee-Keng; Muaz, Siti Dalila; Tangaya, Praveena; Fong, Po-Yee; Ong, Ai-Ling; Mayes, Sean; Chew, Fook-Tim; Kulaveerasingam, Harikrishna; Appleton, David

    2017-06-08

    The fundamental trait in selective breeding of oil palm (Eleais guineensis Jacq.) is the shell thickness surrounding the kernel. The monogenic shell thickness is inversely correlated to mesocarp thickness, where the crude palm oil accumulates. Commercial thin-shelled tenera derived from thick-shelled dura × shell-less pisifera generally contain 30% higher oil per bunch. Two mutations, sh MPOB (M1) and sh AVROS (M2) in the SHELL gene - a type II MADS-box transcription factor mainly present in AVROS and Nigerian origins, were reported to be responsible for different fruit forms. In this study, we have tested 1,339 samples maintained in Sime Darby Plantation using both mutations. Five genotype-phenotype discrepancies and eight controls were then re-tested with all five reported mutations (sh AVROS , sh MPOB , sh MPOB2 , sh MPOB3 and sh MPOB4 ) within the same gene. The integration of genotypic data, pedigree records and shell formation model further explained the haploinsufficiency effect on the SHELL gene with different number of functional copies. Some rare mutations were also identified, suggesting a need to further confirm the existence of cis-compound mutations in the gene. With this, the prediction accuracy of fruit forms can be further improved, especially in introgressive hybrids of oil palm. Understanding causative variant segregation is extremely important, even for monogenic traits such as shell thickness in oil palm.

  16. Predicting Biological Information Flow in a Model Oxygen Minimum Zone

    Science.gov (United States)

    Louca, S.; Hawley, A. K.; Katsev, S.; Beltran, M. T.; Bhatia, M. P.; Michiels, C.; Capelle, D.; Lavik, G.; Doebeli, M.; Crowe, S.; Hallam, S. J.

    2016-02-01

    Microbial activity drives marine biochemical fluxes and nutrient cycling at global scales. Geochemical measurements as well as molecular techniques such as metagenomics, metatranscriptomics and metaproteomics provide great insight into microbial activity. However, an integration of molecular and geochemical data into mechanistic biogeochemical models is still lacking. Recent work suggests that microbial metabolic pathways are, at the ecosystem level, strongly shaped by stoichiometric and energetic constraints. Hence, models rooted in fluxes of matter and energy may yield a holistic understanding of biogeochemistry. Furthermore, such pathway-centric models would allow a direct consolidation with meta'omic data. Here we present a pathway-centric biogeochemical model for the seasonal oxygen minimum zone in Saanich Inlet, a fjord off the coast of Vancouver Island. The model considers key dissimilatory nitrogen and sulfur fluxes, as well as the population dynamics of the genes that mediate them. By assuming a direct translation of biocatalyzed energy fluxes to biosynthesis rates, we make predictions about the distribution and activity of the corresponding genes. A comparison of the model to molecular measurements indicates that the model explains observed DNA, RNA, protein and cell depth profiles. This suggests that microbial activity in marine ecosystems such as oxygen minimum zones is well described by DNA abundance, which, in conjunction with geochemical constraints, determines pathway expression and process rates. Our work further demonstrates how meta'omic data can be mechanistically linked to environmental redox conditions and biogeochemical processes.

  17. Reduced Set of Virulence Genes Allows High Accuracy Prediction of Bacterial Pathogenicity in Humans

    Science.gov (United States)

    Iraola, Gregorio; Vazquez, Gustavo; Spangenberg, Lucía; Naya, Hugo

    2012-01-01

    Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of different virulence-related genes among more than finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes () is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at ), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions. PMID:22916122

  18. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.

    Science.gov (United States)

    Arango-Argoty, Gustavo; Garner, Emily; Pruden, Amy; Heath, Lenwood S; Vikesland, Peter; Zhang, Liqing

    2018-02-01

    Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the "best hits" of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models' performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories. The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The

  19. Predictive Models for Carcinogenicity and Mutagenicity ...

    Science.gov (United States)

    Mutagenicity and carcinogenicity are endpoints of major environmental and regulatory concern. These endpoints are also important targets for development of alternative methods for screening and prediction due to the large number of chemicals of potential concern and the tremendous cost (in time, money, animals) of rodent carcinogenicity bioassays. Both mutagenicity and carcinogenicity involve complex, cellular processes that are only partially understood. Advances in technologies and generation of new data will permit a much deeper understanding. In silico methods for predicting mutagenicity and rodent carcinogenicity based on chemical structural features, along with current mutagenicity and carcinogenicity data sets, have performed well for local prediction (i.e., within specific chemical classes), but are less successful for global prediction (i.e., for a broad range of chemicals). The predictivity of in silico methods can be improved by improving the quality of the data base and endpoints used for modelling. In particular, in vitro assays for clastogenicity need to be improved to reduce false positives (relative to rodent carcinogenicity) and to detect compounds that do not interact directly with DNA or have epigenetic activities. New assays emerging to complement or replace some of the standard assays include VitotoxTM, GreenScreenGC, and RadarScreen. The needs of industry and regulators to assess thousands of compounds necessitate the development of high-t

  20. Accurate, model-based tuning of synthetic gene expression using introns in S. cerevisiae.

    Directory of Open Access Journals (Sweden)

    Ido Yofe

    2014-06-01

    Full Text Available Introns are key regulators of eukaryotic gene expression and present a potentially powerful tool for the design of synthetic eukaryotic gene expression systems. However, intronic control over gene expression is governed by a multitude of complex, incompletely understood, regulatory mechanisms. Despite this lack of detailed mechanistic understanding, here we show how a relatively simple model enables accurate and predictable tuning of synthetic gene expression system in yeast using several predictive intron features such as transcript folding and sequence motifs. Using only natural Saccharomyces cerevisiae introns as regulators, we demonstrate fine and accurate control over gene expression spanning a 100 fold expression range. These results broaden the engineering toolbox of synthetic gene expression systems and provide a framework in which precise and robust tuning of gene expression is accomplished.

  1. Gene therapy in animal models of autosomal dominant retinitis pigmentosa

    Science.gov (United States)

    Rossmiller, Brian; Mao, Haoyu

    2012-01-01

    Gene therapy for dominantly inherited genetic disease is more difficult than gene-based therapy for recessive disorders, which can be treated with gene supplementation. Treatment of dominant disease may require gene supplementation partnered with suppression of the expression of the mutant gene either at the DNA level, by gene repair, or at the RNA level by RNA interference or transcriptional repression. In this review, we examine some of the gene delivery approaches used to treat animal models of autosomal dominant retinitis pigmentosa, focusing on those models associated with mutations in the gene for rhodopsin. We conclude that combinatorial approaches have the greatest promise for success. PMID:23077406

  2. Validated predictive modelling of the environmental resistome.

    Science.gov (United States)

    Amos, Gregory C A; Gozzard, Emma; Carter, Charlotte E; Mead, Andrew; Bowes, Mike J; Hawkey, Peter M; Zhang, Lihong; Singer, Andrew C; Gaze, William H; Wellington, Elizabeth M H

    2015-06-01

    Multi-drug-resistant bacteria pose a significant threat to public health. The role of the environment in the overall rise in antibiotic-resistant infections and risk to humans is largely unknown. This study aimed to evaluate drivers of antibiotic-resistance levels across the River Thames catchment, model key biotic, spatial and chemical variables and produce predictive models for future risk assessment. Sediment samples from 13 sites across the River Thames basin were taken at four time points across 2011 and 2012. Samples were analysed for class 1 integron prevalence and enumeration of third-generation cephalosporin-resistant bacteria. Class 1 integron prevalence was validated as a molecular marker of antibiotic resistance; levels of resistance showed significant geospatial and temporal variation. The main explanatory variables of resistance levels at each sample site were the number, proximity, size and type of surrounding wastewater-treatment plants. Model 1 revealed treatment plants accounted for 49.5% of the variance in resistance levels. Other contributing factors were extent of different surrounding land cover types (for example, Neutral Grassland), temporal patterns and prior rainfall; when modelling all variables the resulting model (Model 2) could explain 82.9% of variations in resistance levels in the whole catchment. Chemical analyses correlated with key indicators of treatment plant effluent and a model (Model 3) was generated based on water quality parameters (contaminant and macro- and micro-nutrient levels). Model 2 was beta tested on independent sites and explained over 78% of the variation in integron prevalence showing a significant predictive ability. We believe all models in this study are highly useful tools for informing and prioritising mitigation strategies to reduce the environmental resistome.

  3. Computational Prediction of MicroRNAs from Toxoplasma gondii Potentially Regulating the Hosts’ Gene Expression

    Directory of Open Access Journals (Sweden)

    Müşerref Duygu Saçar

    2014-10-01

    Full Text Available MicroRNAs (miRNAs were discovered two decades ago, yet there is still a great need for further studies elucidating their genesis and targeting in different phyla. Since experimental discovery and validation of miRNAs is difficult, computational predictions are indispensable and today most computational approaches employ machine learning. Toxoplasma gondii, a parasite residing within the cells of its hosts like human, uses miRNAs for its post-transcriptional gene regulation. It may also regulate its hosts’ gene expression, which has been shown in brain cancer. Since previous studies have shown that overexpressed miRNAs within the host are causal for disease onset, we hypothesized that T. gondii could export miRNAs into its host cell. We computationally predicted all hairpins from the genome of T. gondii and used mouse and human models to filter possible candidates. These were then further compared to known miRNAs in human and rodents and their expression was examined for T. gondii grown in mouse and human hosts, respectively. We found that among the millions of potential hairpins in T. gondii, only a few thousand pass filtering using a human or mouse model and that even fewer of those are expressed. Since they are expressed and differentially expressed in rodents and human, we suggest that there is a chance that T. gondii may export miRNAs into its hosts for direct regulation.

  4. Nonlinear model predictive control theory and algorithms

    CERN Document Server

    Grüne, Lars

    2017-01-01

    This book offers readers a thorough and rigorous introduction to nonlinear model predictive control (NMPC) for discrete-time and sampled-data systems. NMPC schemes with and without stabilizing terminal constraints are detailed, and intuitive examples illustrate the performance of different NMPC variants. NMPC is interpreted as an approximation of infinite-horizon optimal control so that important properties like closed-loop stability, inverse optimality and suboptimality can be derived in a uniform manner. These results are complemented by discussions of feasibility and robustness. An introduction to nonlinear optimal control algorithms yields essential insights into how the nonlinear optimization routine—the core of any nonlinear model predictive controller—works. Accompanying software in MATLAB® and C++ (downloadable from extras.springer.com/), together with an explanatory appendix in the book itself, enables readers to perform computer experiments exploring the possibilities and limitations of NMPC. T...

  5. Baryogenesis model predicting antimatter in the Universe

    International Nuclear Information System (INIS)

    Kirilova, D.

    2003-01-01

    Cosmic ray and gamma-ray data do not rule out antimatter domains in the Universe, separated at distances bigger than 10 Mpc from us. Hence, it is interesting to analyze the possible generation of vast antimatter structures during the early Universe evolution. We discuss a SUSY-condensate baryogenesis model, predicting large separated regions of matter and antimatter. The model provides generation of the small locally observed baryon asymmetry for a natural initial conditions, it predicts vast antimatter domains, separated from the matter ones by baryonically empty voids. The characteristic scale of antimatter regions and their distance from the matter ones is in accordance with observational constraints from cosmic ray, gamma-ray and cosmic microwave background anisotropy data

  6. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    OpenAIRE

    Wright, James C.; Sugden, Deana; Francis-McIntyre, Sue; Riba Garcia, Isabel; Gaskell, Simon J.; Grigoriev, Igor V.; Baker, Scott E.; Beynon, Robert J.; Hubbard, Simon J.

    2009-01-01

    Abstract Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were ac...

  7. Finding Furfural Hydrogenation Catalysts via Predictive Modelling

    OpenAIRE

    Strassberger, Zea; Mooijman, Maurice; Ruijter, Eelco; Alberts, Albert H; Maldonado, Ana G; Orru, Romano V A; Rothenberg, Gadi

    2010-01-01

    Abstract We combine multicomponent reactions, catalytic performance studies and predictive modelling to find transfer hydrogenation catalysts. An initial set of 18 ruthenium-carbene complexes were synthesized and screened in the transfer hydrogenation of furfural to furfurol with isopropyl alcohol complexes gave varied yields, from 62% up to >99.9%, with no obvious structure/activity correlations. Control experiments proved that the carbene ligand remains coordinated to the ruthenium centre t...

  8. Predictive Modeling in Actinide Chemistry and Catalysis

    Energy Technology Data Exchange (ETDEWEB)

    Yang, Ping [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-05-16

    These are slides from a presentation on predictive modeling in actinide chemistry and catalysis. The following topics are covered in these slides: Structures, bonding, and reactivity (bonding can be quantified by optical probes and theory, and electronic structures and reaction mechanisms of actinide complexes); Magnetic resonance properties (transition metal catalysts with multi-nuclear centers, and NMR/EPR parameters); Moving to more complex systems (surface chemistry of nanomaterials, and interactions of ligands with nanoparticles); Path forward and conclusions.

  9. Tectonic predictions with mantle convection models

    Science.gov (United States)

    Coltice, Nicolas; Shephard, Grace E.

    2018-04-01

    Over the past 15 yr, numerical models of convection in Earth's mantle have made a leap forward: they can now produce self-consistent plate-like behaviour at the surface together with deep mantle circulation. These digital tools provide a new window into the intimate connections between plate tectonics and mantle dynamics, and can therefore be used for tectonic predictions, in principle. This contribution explores this assumption. First, initial conditions at 30, 20, 10 and 0 Ma are generated by driving a convective flow with imposed plate velocities at the surface. We then compute instantaneous mantle flows in response to the guessed temperature fields without imposing any boundary conditions. Plate boundaries self-consistently emerge at correct locations with respect to reconstructions, except for small plates close to subduction zones. As already observed for other types of instantaneous flow calculations, the structure of the top boundary layer and upper-mantle slab is the dominant character that leads to accurate predictions of surface velocities. Perturbations of the rheological parameters have little impact on the resulting surface velocities. We then compute fully dynamic model evolution from 30 and 10 to 0 Ma, without imposing plate boundaries or plate velocities. Contrary to instantaneous calculations, errors in kinematic predictions are substantial, although the plate layout and kinematics in several areas remain consistent with the expectations for the Earth. For these calculations, varying the rheological parameters makes a difference for plate boundary evolution. Also, identified errors in initial conditions contribute to first-order kinematic errors. This experiment shows that the tectonic predictions of dynamic models over 10 My are highly sensitive to uncertainties of rheological parameters and initial temperature field in comparison to instantaneous flow calculations. Indeed, the initial conditions and the rheological parameters can be good enough

  10. Benchmarking of gene prediction programs for metagenomic data.

    Science.gov (United States)

    Yok, Non; Rosen, Gail

    2010-01-01

    This manuscript presents the most rigorous benchmarking of gene annotation algorithms for metagenomic datasets to date. We compare three different programs: GeneMark, MetaGeneAnnotator (MGA) and Orphelia. The comparisons are based on their performances over simulated fragments from one hundred species of diverse lineages. We defined four different types of fragments; two types come from the inter- and intra-coding regions and the other types are from the gene edges. Hoff et al. used only 12 species in their comparison; therefore, their sample is too small to represent an environmental sample. Also, no predecessors has separately examined fragments that contain gene edges as opposed to intra-coding regions. General observations in our results are that performances of all these programs improve as we increase the length of the fragment. On the other hand, intra-coding fragments of our data show low annotation error in all of the programs if compared to the gene edge fragments. Overall, we found an upper-bound performance by combining all the methods.

  11. A predictive model for dimensional errors in fused deposition modeling

    DEFF Research Database (Denmark)

    Stolfi, A.

    2015-01-01

    This work concerns the effect of deposition angle (a) and layer thickness (L) on the dimensional performance of FDM parts using a predictive model based on the geometrical description of the FDM filament profile. An experimental validation over the whole a range from 0° to 177° at 3° steps and two...... values of L (0.254 mm, 0.330 mm) was produced by comparing predicted values with external face-to-face measurements. After removing outliers, the results show that the developed two-parameter model can serve as tool for modeling the FDM dimensional behavior in a wide range of deposition angles....

  12. Gene-Environment Interplay in Twin Models

    Science.gov (United States)

    Hatemi, Peter K.

    2013-01-01

    In this article, we respond to Shultziner’s critique that argues that identical twins are more alike not because of genetic similarity, but because they select into more similar environments and respond to stimuli in comparable ways, and that these effects bias twin model estimates to such an extent that they are invalid. The essay further argues that the theory and methods that undergird twin models, as well as the empirical studies which rely upon them, are unaware of these potential biases. We correct this and other misunderstandings in the essay and find that gene-environment (GE) interplay is a well-articulated concept in behavior genetics and political science, operationalized as gene-environment correlation and gene-environment interaction. Both are incorporated into interpretations of the classical twin design (CTD) and estimated in numerous empirical studies through extensions of the CTD. We then conduct simulations to quantify the influence of GE interplay on estimates from the CTD. Due to the criticism’s mischaracterization of the CTD and GE interplay, combined with the absence of any empirical evidence to counter what is presented in the extant literature and this article, we conclude that the critique does not enhance our understanding of the processes that drive political traits, genetic or otherwise. PMID:24808718

  13. Two stage neural network modelling for robust model predictive control.

    Science.gov (United States)

    Patan, Krzysztof

    2018-01-01

    The paper proposes a novel robust model predictive control scheme realized by means of artificial neural networks. The neural networks are used twofold: to design the so-called fundamental model of a plant and to catch uncertainty associated with the plant model. In order to simplify the optimization process carried out within the framework of predictive control an instantaneous linearization is applied which renders it possible to define the optimization problem in the form of constrained quadratic programming. Stability of the proposed control system is also investigated by showing that a cost function is monotonically decreasing with respect to time. Derived robust model predictive control is tested and validated on the example of a pneumatic servomechanism working at different operating regimes. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  14. A Gene Expression Profile of BRCAness That Predicts for Responsiveness to Platinum and PARP Inhibitors

    Science.gov (United States)

    2017-02-01

    affecting the function of Fanconi Anemia (FA) genes ( FANCA /B/C/D2/E/F/G/I/J/L/M, PALB2) or DNA damage response genes involved in HR 5 (ATM, ATR...Award Number: W81XWH-10-1-0585 TITLE: A Gene Expression Profile of BRCAness That Predicts for Responsiveness to Platinum and PARP Inhibitors...To) 15 July 2010 – 2 Nov.2016 4. TITLE AND SUBTITLE A Gene Expression Profile of BRCAness That Predicts for Responsiveness to Platinum and PARP

  15. Predicting extinction rates in stochastic epidemic models

    International Nuclear Information System (INIS)

    Schwartz, Ira B; Billings, Lora; Dykman, Mark; Landsman, Alexandra

    2009-01-01

    We investigate the stochastic extinction processes in a class of epidemic models. Motivated by the process of natural disease extinction in epidemics, we examine the rate of extinction as a function of disease spread. We show that the effective entropic barrier for extinction in a susceptible–infected–susceptible epidemic model displays scaling with the distance to the bifurcation point, with an unusual critical exponent. We make a direct comparison between predictions and numerical simulations. We also consider the effect of non-Gaussian vaccine schedules, and show numerically how the extinction process may be enhanced when the vaccine schedules are Poisson distributed

  16. Predictive Modeling of the CDRA 4BMS

    Science.gov (United States)

    Coker, Robert F.; Knox, James C.

    2016-01-01

    As part of NASA's Advanced Exploration Systems (AES) program and the Life Support Systems Project (LSSP), fully predictive models of the Four Bed Molecular Sieve (4BMS) of the Carbon Dioxide Removal Assembly (CDRA) on the International Space Station (ISS) are being developed. This virtual laboratory will be used to help reduce mass, power, and volume requirements for future missions. In this paper we describe current and planned modeling developments in the area of carbon dioxide removal to support future crewed Mars missions as well as the resolution of anomalies observed in the ISS CDRA.

  17. Predictive value of MSH2 gene expression in colorectal cancer treated with capecitabine

    DEFF Research Database (Denmark)

    Jensen, Lars H; Danenberg, Kathleen D; Danenberg, Peter V

    2007-01-01

    was associated with a hazard ratio of 0.5 (95% confidence interval, 0.23-1.11; P = 0.083) in survival analysis. CONCLUSION: The higher gene expression of MSH2 in responders and the trend for predicting overall survival indicates a predictive value of this marker in the treatment of advanced CRC with capecitabine.......PURPOSE: The objective of the present study was to evaluate the gene expression of the DNA mismatch repair gene MSH2 as a predictive marker in advanced colorectal cancer (CRC) treated with first-line capecitabine. PATIENTS AND METHODS: Microdissection of paraffin-embedded tumor tissue, RNA...

  18. Data Driven Economic Model Predictive Control

    Directory of Open Access Journals (Sweden)

    Masoud Kheradmandi

    2018-04-01

    Full Text Available This manuscript addresses the problem of data driven model based economic model predictive control (MPC design. To this end, first, a data-driven Lyapunov-based MPC is designed, and shown to be capable of stabilizing a system at an unstable equilibrium point. The data driven Lyapunov-based MPC utilizes a linear time invariant (LTI model cognizant of the fact that the training data, owing to the unstable nature of the equilibrium point, has to be obtained from closed-loop operation or experiments. Simulation results are first presented demonstrating closed-loop stability under the proposed data-driven Lyapunov-based MPC. The underlying data-driven model is then utilized as the basis to design an economic MPC. The economic improvements yielded by the proposed method are illustrated through simulations on a nonlinear chemical process system example.

  19. A novel gene network inference algorithm using predictive minimum description length approach.

    Science.gov (United States)

    Chaitankar, Vijender; Ghosh, Preetam; Perkins, Edward J; Gong, Ping; Deng, Youping; Zhang, Chaoyang

    2010-05-28

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold which defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we proposed a new inference algorithm which incorporated mutual information (MI), conditional mutual information (CMI) and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm was evaluated using both synthetic time series data sets and a biological time series data set for the yeast Saccharomyces cerevisiae. The benchmark quantities precision and recall were used as performance measures. The results show that the proposed algorithm produced less false edges and significantly improved the precision, as compared to the existing algorithm. For further analysis the performance of the algorithms was observed over different sizes of data. We have proposed a new algorithm that implements the PMDL principle for inferring gene regulatory networks from time series DNA microarray data that eliminates the need of a fine tuning parameter. The evaluation results obtained from both synthetic and actual biological data sets show that the

  20. A molecular prognostic model predicts esophageal squamous cell carcinoma prognosis.

    Directory of Open Access Journals (Sweden)

    Hui-Hui Cao

    Full Text Available Esophageal squamous cell carcinoma (ESCC has the highest mortality rates in China. The 5-year survival rate of ESCC remains dismal despite improvements in treatments such as surgical resection and adjuvant chemoradiation, and current clinical staging approaches are limited in their ability to effectively stratify patients for treatment options. The aim of the present study, therefore, was to develop an immunohistochemistry-based prognostic model to improve clinical risk assessment for patients with ESCC.We developed a molecular prognostic model based on the combined expression of axis of epidermal growth factor receptor (EGFR, phosphorylated Specificity protein 1 (p-Sp1, and Fascin proteins. The presence of this prognostic model and associated clinical outcomes were analyzed for 130 formalin-fixed, paraffin-embedded esophageal curative resection specimens (generation dataset and validated using an independent cohort of 185 specimens (validation dataset.The expression of these three genes at the protein level was used to build a molecular prognostic model that was highly predictive of ESCC survival in both generation and validation datasets (P = 0.001. Regression analysis showed that this molecular prognostic model was strongly and independently predictive of overall survival (hazard ratio = 2.358 [95% CI, 1.391-3.996], P = 0.001 in generation dataset; hazard ratio = 1.990 [95% CI, 1.256-3.154], P = 0.003 in validation dataset. Furthermore, the predictive ability of these 3 biomarkers in combination was more robust than that of each individual biomarker.This technically simple immunohistochemistry-based molecular model accurately predicts ESCC patient survival and thus could serve as a complement to current clinical risk stratification approaches.

  1. Plant control using embedded predictive models

    International Nuclear Information System (INIS)

    Godbole, S.S.; Gabler, W.E.; Eschbach, S.L.

    1990-01-01

    B and W recently undertook the design of an advanced light water reactor control system. A concept new to nuclear steam system (NSS) control was developed. The concept, which is called the Predictor-Corrector, uses mathematical models of portions of the controlled NSS to calculate, at various levels within the system, demand and control element position signals necessary to satisfy electrical demand. The models give the control system the ability to reduce overcooling and undercooling of the reactor coolant system during transients and upsets. Two types of mathematical models were developed for use in designing and testing the control system. One model was a conventional, comprehensive NSS model that responds to control system outputs and calculates the resultant changes in plant variables that are then used as inputs to the control system. Two other models, embedded in the control system, were less conventional, inverse models. These models accept as inputs plant variables, equipment states, and demand signals and predict plant operating conditions and control element states that will satisfy the demands. This paper reports preliminary results of closed-loop Reactor Coolant (RC) pump trip and normal load reduction testing of the advanced concept. Results of additional transient testing, and of open and closed loop stability analyses will be reported as they are available

  2. Exploring gene expression signatures for predicting disease free survival after resection of colorectal cancer liver metastases.

    Directory of Open Access Journals (Sweden)

    Nikol Snoeren

    Full Text Available BACKGROUND AND OBJECTIVES: This study was designed to identify and validate gene signatures that can predict disease free survival (DFS in patients undergoing a radical resection for their colorectal liver metastases (CRLM. METHODS: Tumor gene expression profiles were collected from 119 patients undergoing surgery for their CRLM in the Paul Brousse Hospital (France and the University Medical Center Utrecht (The Netherlands. Patients were divided into high and low risk groups. A randomly selected training set was used to find predictive gene signatures. The ability of these gene signatures to predict DFS was tested in an independent validation set comprising the remaining patients. Furthermore, 5 known clinical risk scores were tested in our complete patient cohort. RESULT: No gene signature was found that significantly predicted DFS in the validation set. In contrast, three out of five clinical risk scores were able to predict DFS in our patient cohort. CONCLUSIONS: No gene signature was found that could predict DFS in patients undergoing CRLM resection. Three out of five clinical risk scores were able to predict DFS in our patient cohort. These results emphasize the need for validating risk scores in independent patient groups and suggest improved designs for future studies.

  3. Prediction of target genes for miR-140-5p in pulmonary arterial hypertension using bioinformatics methods.

    Science.gov (United States)

    Li, Fangwei; Shi, Wenhua; Wan, Yixin; Wang, Qingting; Feng, Wei; Yan, Xin; Wang, Jian; Chai, Limin; Zhang, Qianqian; Li, Manxiang

    2017-12-01

    The expression of microRNA (miR)-140-5p is known to be reduced in both pulmonary arterial hypertension (PAH) patients and monocrotaline-induced PAH models in rat. Identification of target genes for miR-140-5p with bioinformatics analysis may reveal new pathways and connections in PAH. This study aimed to explore downstream target genes and relevant signaling pathways regulated by miR-140-5p to provide theoretical evidences for further researches on role of miR-140-5p in PAH. Multiple downstream target genes and upstream transcription factors (TFs) of miR-140-5p were predicted in the analysis. Gene ontology (GO) enrichment analysis indicated that downstream target genes of miR-140-5p were enriched in many biological processes, such as biological regulation, signal transduction, response to chemical stimulus, stem cell proliferation, cell surface receptor signaling pathways. Kyoto Encyclopedia of Genes and Genome (KEGG) pathway analysis found that downstream target genes were mainly located in Notch, TGF-beta, PI3K/Akt, and Hippo signaling pathway. According to TF-miRNA-mRNA network, the important downstream target genes of miR-140-5p were PPI, TGF-betaR1, smad4, JAG1, ADAM10, FGF9, PDGFRA, VEGFA, LAMC1, TLR4, and CREB. After thoroughly reviewing published literature, we found that 23 target genes and seven signaling pathways were truly inhibited by miR-140-5p in various tissues or cells; most of these verified targets were in accordance with our present prediction. Other predicted targets still need further verification in vivo and in vitro .

  4. Ground Motion Prediction Models for Caucasus Region

    Science.gov (United States)

    Jorjiashvili, Nato; Godoladze, Tea; Tvaradze, Nino; Tumanova, Nino

    2016-04-01

    Ground motion prediction models (GMPMs) relate ground motion intensity measures to variables describing earthquake source, path, and site effects. Estimation of expected ground motion is a fundamental earthquake hazard assessment. The most commonly used parameter for attenuation relation is peak ground acceleration or spectral acceleration because this parameter gives useful information for Seismic Hazard Assessment. Since 2003 development of Georgian Digital Seismic Network has started. In this study new GMP models are obtained based on new data from Georgian seismic network and also from neighboring countries. Estimation of models is obtained by classical, statistical way, regression analysis. In this study site ground conditions are additionally considered because the same earthquake recorded at the same distance may cause different damage according to ground conditions. Empirical ground-motion prediction models (GMPMs) require adjustment to make them appropriate for site-specific scenarios. However, the process of making such adjustments remains a challenge. This work presents a holistic framework for the development of a peak ground acceleration (PGA) or spectral acceleration (SA) GMPE that is easily adjustable to different seismological conditions and does not suffer from the practical problems associated with adjustments in the response spectral domain.

  5. Modeling and Prediction of Krueger Device Noise

    Science.gov (United States)

    Guo, Yueping; Burley, Casey L.; Thomas, Russell H.

    2016-01-01

    This paper presents the development of a noise prediction model for aircraft Krueger flap devices that are considered as alternatives to leading edge slotted slats. The prediction model decomposes the total Krueger noise into four components, generated by the unsteady flows, respectively, in the cove under the pressure side surface of the Krueger, in the gap between the Krueger trailing edge and the main wing, around the brackets supporting the Krueger device, and around the cavity on the lower side of the main wing. For each noise component, the modeling follows a physics-based approach that aims at capturing the dominant noise-generating features in the flow and developing correlations between the noise and the flow parameters that control the noise generation processes. The far field noise is modeled using each of the four noise component's respective spectral functions, far field directivities, Mach number dependencies, component amplitudes, and other parametric trends. Preliminary validations are carried out by using small scale experimental data, and two applications are discussed; one for conventional aircraft and the other for advanced configurations. The former focuses on the parametric trends of Krueger noise on design parameters, while the latter reveals its importance in relation to other airframe noise components.

  6. Prediction of Chemical Function: Model Development and ...

    Science.gov (United States)

    The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (HT) screening-level exposures developed under ExpoCast can be combined with HT screening (HTS) bioactivity data for the risk-based prioritization of chemicals for further evaluation. The functional role (e.g. solvent, plasticizer, fragrance) that a chemical performs can drive both the types of products in which it is found and the concentration in which it is present and therefore impacting exposure potential. However, critical chemical use information (including functional role) is lacking for the majority of commercial chemicals for which exposure estimates are needed. A suite of machine-learning based models for classifying chemicals in terms of their likely functional roles in products based on structure were developed. This effort required collection, curation, and harmonization of publically-available data sources of chemical functional use information from government and industry bodies. Physicochemical and structure descriptor data were generated for chemicals with function data. Machine-learning classifier models for function were then built in a cross-validated manner from the descriptor/function data using the method of random forests. The models were applied to: 1) predict chemi

  7. Evaluating Predictive Models of Software Quality

    Science.gov (United States)

    Ciaschini, V.; Canaparo, M.; Ronchieri, E.; Salomoni, D.

    2014-06-01

    Applications from High Energy Physics scientific community are constantly growing and implemented by a large number of developers. This implies a strong churn on the code and an associated risk of faults, which is unavoidable as long as the software undergoes active evolution. However, the necessities of production systems run counter to this. Stability and predictability are of paramount importance; in addition, a short turn-around time for the defect discovery-correction-deployment cycle is required. A way to reconcile these opposite foci is to use a software quality model to obtain an approximation of the risk before releasing a program to only deliver software with a risk lower than an agreed threshold. In this article we evaluated two quality predictive models to identify the operational risk and the quality of some software products. We applied these models to the development history of several EMI packages with intent to discover the risk factor of each product and compare it with its real history. We attempted to determine if the models reasonably maps reality for the applications under evaluation, and finally we concluded suggesting directions for further studies.

  8. Predicting FLDs Using a Multiscale Modeling Scheme

    Science.gov (United States)

    Wu, Z.; Loy, C.; Wang, E.; Hegadekatte, V.

    2017-09-01

    The measurement of a single forming limit diagram (FLD) requires significant resources and is time consuming. We have developed a multiscale modeling scheme to predict FLDs using a combination of limited laboratory testing, crystal plasticity (VPSC) modeling, and dual sequential-stage finite element (ABAQUS/Explicit) modeling with the Marciniak-Kuczynski (M-K) criterion to determine the limit strain. We have established a means to work around existing limitations in ABAQUS/Explicit by using an anisotropic yield locus (e.g., BBC2008) in combination with the M-K criterion. We further apply a VPSC model to reduce the number of laboratory tests required to characterize the anisotropic yield locus. In the present work, we show that the predicted FLD is in excellent agreement with the measured FLD for AA5182 in the O temper. Instead of 13 different tests as for a traditional FLD determination within Novelis, our technique uses just four measurements: tensile properties in three orientations; plane strain tension; biaxial bulge; and the sheet crystallographic texture. The turnaround time is consequently far less than for the traditional laboratory measurement of the FLD.

  9. PREDICTION MODELS OF GRAIN YIELD AND CHARACTERIZATION

    Directory of Open Access Journals (Sweden)

    Narciso Ysac Avila Serrano

    2009-06-01

    Full Text Available With the objective to characterize the grain yield of five cowpea cultivars and to find linear regression models to predict it, a study was developed in La Paz, Baja California Sur, Mexico. A complete randomized blocks design was used. Simple and multivariate analyses of variance were carried out using the canonical variables to characterize the cultivars. The variables cluster per plant, pods per plant, pods per cluster, seeds weight per plant, seeds hectoliter weight, 100-seed weight, seeds length, seeds wide, seeds thickness, pods length, pods wide, pods weight, seeds per pods, and seeds weight per pods, showed significant differences (P≤ 0.05 among cultivars. Paceño and IT90K-277-2 cultivars showed the higher seeds weight per plant. The linear regression models showed correlation coefficients ≥0.92. In these models, the seeds weight per plant, pods per cluster, pods per plant, cluster per plant and pods length showed significant correlations (P≤ 0.05. In conclusion, the results showed that grain yield differ among cultivars and for its estimation, the prediction models showed determination coefficients highly dependable.

  10. Evaluating predictive models of software quality

    International Nuclear Information System (INIS)

    Ciaschini, V; Canaparo, M; Ronchieri, E; Salomoni, D

    2014-01-01

    Applications from High Energy Physics scientific community are constantly growing and implemented by a large number of developers. This implies a strong churn on the code and an associated risk of faults, which is unavoidable as long as the software undergoes active evolution. However, the necessities of production systems run counter to this. Stability and predictability are of paramount importance; in addition, a short turn-around time for the defect discovery-correction-deployment cycle is required. A way to reconcile these opposite foci is to use a software quality model to obtain an approximation of the risk before releasing a program to only deliver software with a risk lower than an agreed threshold. In this article we evaluated two quality predictive models to identify the operational risk and the quality of some software products. We applied these models to the development history of several EMI packages with intent to discover the risk factor of each product and compare it with its real history. We attempted to determine if the models reasonably maps reality for the applications under evaluation, and finally we concluded suggesting directions for further studies.

  11. Gamma-Ray Pulsars Models and Predictions

    CERN Document Server

    Harding, A K

    2001-01-01

    Pulsed emission from gamma-ray pulsars originates inside the magnetosphere, from radiation by charged particles accelerated near the magnetic poles or in the outer gaps. In polar cap models, the high energy spectrum is cut off by magnetic pair production above an energy that is dependent on the local magnetic field strength. While most young pulsars with surface fields in the range B = 10^{12} - 10^{13} G are expected to have high energy cutoffs around several GeV, the gamma-ray spectra of old pulsars having lower surface fields may extend to 50 GeV. Although the gamma-ray emission of older pulsars is weaker, detecting pulsed emission at high energies from nearby sources would be an important confirmation of polar cap models. Outer gap models predict more gradual high-energy turnovers at around 10 GeV, but also predict an inverse Compton component extending to TeV energies. Detection of pulsed TeV emission, which would not survive attenuation at the polar caps, is thus an important test of outer gap models. N...

  12. Artificial Neural Network Model for Predicting Compressive

    Directory of Open Access Journals (Sweden)

    Salim T. Yousif

    2013-05-01

    Full Text Available   Compressive strength of concrete is a commonly used criterion in evaluating concrete. Although testing of the compressive strength of concrete specimens is done routinely, it is performed on the 28th day after concrete placement. Therefore, strength estimation of concrete at early time is highly desirable. This study presents the effort in applying neural network-based system identification techniques to predict the compressive strength of concrete based on concrete mix proportions, maximum aggregate size (MAS, and slump of fresh concrete. Back-propagation neural networks model is successively developed, trained, and tested using actual data sets of concrete mix proportions gathered from literature.    The test of the model by un-used data within the range of input parameters shows that the maximum absolute error for model is about 20% and 88% of the output results has absolute errors less than 10%. The parametric study shows that water/cement ratio (w/c is the most significant factor  affecting the output of the model.     The results showed that neural networks has strong potential as a feasible tool for predicting compressive strength of concrete.

  13. Clinical Predictive Modeling Development and Deployment through FHIR Web Services.

    Science.gov (United States)

    Khalilia, Mohammed; Choi, Myung; Henderson, Amelia; Iyengar, Sneha; Braunstein, Mark; Sun, Jimeng

    2015-01-01

    Clinical predictive modeling involves two challenging tasks: model development and model deployment. In this paper we demonstrate a software architecture for developing and deploying clinical predictive models using web services via the Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) standard. The services enable model development using electronic health records (EHRs) stored in OMOP CDM databases and model deployment for scoring individual patients through FHIR resources. The MIMIC2 ICU dataset and a synthetic outpatient dataset were transformed into OMOP CDM databases for predictive model development. The resulting predictive models are deployed as FHIR resources, which receive requests of patient information, perform prediction against the deployed predictive model and respond with prediction scores. To assess the practicality of this approach we evaluated the response and prediction time of the FHIR modeling web services. We found the system to be reasonably fast with one second total response time per patient prediction.

  14. Prediction of the contact sensitizing potential of chemicals using analysis of gene expression changes in human THP-1 monocytes.

    Science.gov (United States)

    Arkusz, Joanna; Stępnik, Maciej; Sobala, Wojciech; Dastych, Jarosław

    2010-11-10

    The aim of this study was to find differentially regulated genes in THP-1 monocytic cells exposed to sensitizers and nonsensitizers and to investigate if such genes could be reliable markers for an in vitro predictive method for the identification of skin sensitizing chemicals. Changes in expression of 35 genes in the THP-1 cell line following treatment with chemicals of different sensitizing potential (from nonsensitizers to extreme sensitizers) were assessed using real-time PCR. Verification of 13 candidate genes by testing a large number of chemicals (an additional 22 sensitizers and 8 nonsensitizers) revealed that prediction of contact sensitization potential was possible based on evaluation of changes in three genes: IL8, HMOX1 and PAIMP1. In total, changes in expression of these genes allowed correct detection of sensitization potential of 21 out of 27 (78%) test sensitizers. The gene expression levels inside potency groups varied and did not allow estimation of sensitization potency of test chemicals. Results of this study indicate that evaluation of changes in expression of proposed biomarkers in THP-1 cells could be a valuable model for preliminary screening of chemicals to discriminate an appreciable majority of sensitizers from nonsensitizers. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  15. ABC gene-ranking for prediction of drug-induced cholestasis in rats

    Directory of Open Access Journals (Sweden)

    Yauheniya Cherkas

    Full Text Available As legacy toxicogenomics databases have become available, improved data mining approaches are now key to extracting and visualizing subtle relationships between toxicants and gene expression. In the present study, a novel “aggregating bundles of clusters” (ABC procedure was applied to separate cholestatic from non-cholestatic drugs and model toxicants in the Johnson & Johnson (Janssen rat liver toxicogenomics database [3]. Drug-induced cholestasis is an important issue, particularly when a new compound enters the market with this liability, with standard preclinical models often mispredicting this toxicity. Three well-characterized cholestasis-responsive genes (Cyp7a1, Mrp3 and Bsep were chosen from a previous in-house Janssen gene expression signature; these three genes show differing, non-redundant responses across the 90+ paradigm compounds in our database. Using the ABC procedure, extraneous contributions were minimized in comparisons of compound gene responses. All genes were assigned weights proportional to their correlations with Cyp7a1, Mrp3 and Bsep, and a resampling technique was used to derive a stable measure of compound similarity. The compounds that were known to be associated with rat cholestasis generally had small values of this measure relative to each other but also had large values of this measure relative to non-cholestatic compounds. Visualization of the data with the ABC-derived signature showed a very tight, essentially identically behaving cluster of robust human cholestatic drugs and experimental cholestatic toxicants (ethinyl estradiol, LPS, ANIT and methylene dianiline, disulfiram, naltrexone, methapyrilene, phenacetin, alpha-methyl dopa, flutamide, the NSAIDs–—indomethacin, flurbiprofen, diclofenac, flufenamic acid, sulindac, and nimesulide, butylated hydroxytoluene, piperonyl butoxide, and bromobenzene, some slightly less active compounds (3′-acetamidofluorene, amsacrine, hydralazine, tannic acid, some

  16. An analytical model for climatic predictions

    International Nuclear Information System (INIS)

    Njau, E.C.

    1990-12-01

    A climatic model based upon analytical expressions is presented. This model is capable of making long-range predictions of heat energy variations on regional or global scales. These variations can then be transformed into corresponding variations of some other key climatic parameters since weather and climatic changes are basically driven by differential heating and cooling around the earth. On the basis of the mathematical expressions upon which the model is based, it is shown that the global heat energy structure (and hence the associated climatic system) are characterized by zonally as well as latitudinally propagating fluctuations at frequencies downward of 0.5 day -1 . We have calculated the propagation speeds for those particular frequencies that are well documented in the literature. The calculated speeds are in excellent agreement with the measured speeds. (author). 13 refs

  17. An Anisotropic Hardening Model for Springback Prediction

    Science.gov (United States)

    Zeng, Danielle; Xia, Z. Cedric

    2005-08-01

    As more Advanced High-Strength Steels (AHSS) are heavily used for automotive body structures and closures panels, accurate springback prediction for these components becomes more challenging because of their rapid hardening characteristics and ability to sustain even higher stresses. In this paper, a modified Mroz hardening model is proposed to capture realistic Bauschinger effect at reverse loading, such as when material passes through die radii or drawbead during sheet metal forming process. This model accounts for material anisotropic yield surface and nonlinear isotropic/kinematic hardening behavior. Material tension/compression test data are used to accurately represent Bauschinger effect. The effectiveness of the model is demonstrated by comparison of numerical and experimental springback results for a DP600 straight U-channel test.

  18. An Anisotropic Hardening Model for Springback Prediction

    International Nuclear Information System (INIS)

    Zeng, Danielle; Xia, Z. Cedric

    2005-01-01

    As more Advanced High-Strength Steels (AHSS) are heavily used for automotive body structures and closures panels, accurate springback prediction for these components becomes more challenging because of their rapid hardening characteristics and ability to sustain even higher stresses. In this paper, a modified Mroz hardening model is proposed to capture realistic Bauschinger effect at reverse loading, such as when material passes through die radii or drawbead during sheet metal forming process. This model accounts for material anisotropic yield surface and nonlinear isotropic/kinematic hardening behavior. Material tension/compression test data are used to accurately represent Bauschinger effect. The effectiveness of the model is demonstrated by comparison of numerical and experimental springback results for a DP600 straight U-channel test

  19. Prediction of metabolic flux distribution from gene expression data based on the flux minimization principle.

    Directory of Open Access Journals (Sweden)

    Hyun-Seob Song

    Full Text Available Prediction of possible flux distributions in a metabolic network provides detailed phenotypic information that links metabolism to cellular physiology. To estimate metabolic steady-state fluxes, the most common approach is to solve a set of macroscopic mass balance equations subjected to stoichiometric constraints while attempting to optimize an assumed optimal objective function. This assumption is justifiable in specific cases but may be invalid when tested across different conditions, cell populations, or other organisms. With an aim to providing a more consistent and reliable prediction of flux distributions over a wide range of conditions, in this article we propose a framework that uses the flux minimization principle to predict active metabolic pathways from mRNA expression data. The proposed algorithm minimizes a weighted sum of flux magnitudes, while biomass production can be bounded to fit an ample range from very low to very high values according to the analyzed context. We have formulated the flux weights as a function of the corresponding enzyme reaction's gene expression value, enabling the creation of context-specific fluxes based on a generic metabolic network. In case studies of wild-type Saccharomyces cerevisiae, and wild-type and mutant Escherichia coli strains, our method achieved high prediction accuracy, as gauged by correlation coefficients and sums of squared error, with respect to the experimentally measured values. In contrast to other approaches, our method was able to provide quantitative predictions for both model organisms under a variety of conditions. Our approach requires no prior knowledge or assumption of a context-specific metabolic functionality and does not require trial-and-error parameter adjustments. Thus, our framework is of general applicability for modeling the transcription-dependent metabolism of bacteria and yeasts.

  20. Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrroly-sine containing genes

    DEFF Research Database (Denmark)

    Have, Christian Theil; Zambach, Sine; Christiansen, Henning

    2013-01-01

    for prediction of pyrrolysine incorporating genes in genomes of bacteria and archaea leading to insights about the factors driving pyrrolysine translation and identification of new gene candidates. The method predicts known conserved genes with high recall and predicts several other promising candidates...... for experimental verification. The method is implemented as a computational pipeline which is available on request....

  1. Predicting effects of structural stress in a genome-reduced model bacterial metabolism

    Science.gov (United States)

    Güell, Oriol; Sagués, Francesc; Serrano, M. Ángeles

    2012-08-01

    Mycoplasma pneumoniae is a human pathogen recently proposed as a genome-reduced model for bacterial systems biology. Here, we study the response of its metabolic network to different forms of structural stress, including removal of individual and pairs of reactions and knockout of genes and clusters of co-expressed genes. Our results reveal a network architecture as robust as that of other model bacteria regarding multiple failures, although less robust against individual reaction inactivation. Interestingly, metabolite motifs associated to reactions can predict the propagation of inactivation cascades and damage amplification effects arising in double knockouts. We also detect a significant correlation between gene essentiality and damages produced by single gene knockouts, and find that genes controlling high-damage reactions tend to be expressed independently of each other, a functional switch mechanism that, simultaneously, acts as a genetic firewall to protect metabolism. Prediction of failure propagation is crucial for metabolic engineering or disease treatment.

  2. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

    Directory of Open Access Journals (Sweden)

    Grigoriev Igor V

    2009-02-01

    Full Text Available Abstract Background Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR. Results 405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6% of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models. Conclusion This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

  3. Effects of sample size on robustness and prediction accuracy of a prognostic gene signature

    Directory of Open Access Journals (Sweden)

    Kim Seon-Young

    2009-05-01

    Full Text Available Abstract Background Few overlap between independently developed gene signatures and poor inter-study applicability of gene signatures are two of major concerns raised in the development of microarray-based prognostic gene signatures. One recent study suggested that thousands of samples are needed to generate a robust prognostic gene signature. Results A data set of 1,372 samples was generated by combining eight breast cancer gene expression data sets produced using the same microarray platform and, using the data set, effects of varying samples sizes on a few performances of a prognostic gene signature were investigated. The overlap between independently developed gene signatures was increased linearly with more samples, attaining an average overlap of 16.56% with 600 samples. The concordance between predicted outcomes by different gene signatures also was increased with more samples up to 94.61% with 300 samples. The accuracy of outcome prediction also increased with more samples. Finally, analysis using only Estrogen Receptor-positive (ER+ patients attained higher prediction accuracy than using both patients, suggesting that sub-type specific analysis can lead to the development of better prognostic gene signatures Conclusion Increasing sample sizes generated a gene signature with better stability, better concordance in outcome prediction, and better prediction accuracy. However, the degree of performance improvement by the increased sample size was different between the degree of overlap and the degree of concordance in outcome prediction, suggesting that the sample size required for a study should be determined according to the specific aims of the study.

  4. Protein modelling of triterpene synthase genes from mangrove plants using Phyre2 and Swiss-model

    Science.gov (United States)

    Basyuni, M.; Wati, R.; Sulistiyono, N.; Hayati, R.; Sumardi; Oku, H.; Baba, S.; Sagami, H.

    2018-03-01

    Molecular cloning of five oxidosqualene cyclases (OSC) genes from Bruguiera gymnorrhiza, Kandelia candel, and Rhizophora stylosa had previously been cloned, characterized, and encoded mono and -multi triterpene synthases. The present study analyzed protein modelling of triterpene synthase genes from mangrove using Phyre2 and Swiss-model. The diversity was noted within protein modelling of triterpene synthases using Phyre2 from sequence identity (38-43%) and residue (696-703). RsM2 was distinguishable from others for template structure; it used lanosterol synthase as a template (PDB ID: w6j.1.A). By contrast, other genes used human lanosterol synthase (1w6k.1.A). The predicted bind sites were correlated with the product of triterpene synthase, the product of BgbAS was β-amyrin, while RsM1 contained a significant amount of β-amyrin. Similarly BgLUS and KcMS, both main products was lupeol, on the other hand, RsM2 with the outcome of taraxerol. Homology modelling revealed that 696 residues of BgbAS, BgLUS, RsM1, and RsM2 (91-92% of the amino acid sequence) had been modelled with 100% confidence by the single highest scoring template using Phyre2. This coverage was higher than Swiss-model (85-90%). The present study suggested that molecular cloning of triterpene genes provides useful tools for studying the protein modelling related regulation of isoprenoids biosynthesis in mangrove forests.

  5. Comparison of RNA-seq and microarray-based models for clinical endpoint prediction.

    Science.gov (United States)

    Zhang, Wenqian; Yu, Ying; Hertwig, Falk; Thierry-Mieg, Jean; Zhang, Wenwei; Thierry-Mieg, Danielle; Wang, Jian; Furlanello, Cesare; Devanarayan, Viswanath; Cheng, Jie; Deng, Youping; Hero, Barbara; Hong, Huixiao; Jia, Meiwen; Li, Li; Lin, Simon M; Nikolsky, Yuri; Oberthuer, André; Qing, Tao; Su, Zhenqiang; Volland, Ruth; Wang, Charles; Wang, May D; Ai, Junmei; Albanese, Davide; Asgharzadeh, Shahab; Avigad, Smadar; Bao, Wenjun; Bessarabova, Marina; Brilliant, Murray H; Brors, Benedikt; Chierici, Marco; Chu, Tzu-Ming; Zhang, Jibin; Grundy, Richard G; He, Min Max; Hebbring, Scott; Kaufman, Howard L; Lababidi, Samir; Lancashire, Lee J; Li, Yan; Lu, Xin X; Luo, Heng; Ma, Xiwen; Ning, Baitang; Noguera, Rosa; Peifer, Martin; Phan, John H; Roels, Frederik; Rosswog, Carolina; Shao, Susan; Shen, Jie; Theissen, Jessica; Tonini, Gian Paolo; Vandesompele, Jo; Wu, Po-Yen; Xiao, Wenzhong; Xu, Joshua; Xu, Weihong; Xuan, Jiekun; Yang, Yong; Ye, Zhan; Dong, Zirui; Zhang, Ke K; Yin, Ye; Zhao, Chen; Zheng, Yuanting; Wolfinger, Russell D; Shi, Tieliu; Malkas, Linda H; Berthold, Frank; Wang, Jun; Tong, Weida; Shi, Leming; Peng, Zhiyu; Fischer, Matthias

    2015-06-25

    Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.

  6. Bioinformatics analysis of the predicted polyprenol reductase genes in higher plants

    Science.gov (United States)

    Basyuni, M.; Wati, R.

    2018-03-01

    The present study evaluates the bioinformatics methods to analyze twenty-four predicted polyprenol reductase genes from higher plants on GenBank as well as predicted the structure, composition, similarity, subcellular localization, and phylogenetic. The physicochemical properties of plant polyprenol showed diversity among the observed genes. The percentage of the secondary structure of plant polyprenol genes followed the ratio order of α helix > random coil > extended chain structure. The values of chloroplast but not signal peptide were too low, indicated that few chloroplast transit peptide in plant polyprenol reductase genes. The possibility of the potential transit peptide showed variation among the plant polyprenol reductase, suggested the importance of understanding the variety of peptide components of plant polyprenol genes. To clarify this finding, a phylogenetic tree was drawn. The phylogenetic tree shows several branches in the tree, suggested that plant polyprenol reductase genes grouped into divergent clusters in the tree.

  7. Web tools for predictive toxicology model building.

    Science.gov (United States)

    Jeliazkova, Nina

    2012-07-01

    The development and use of web tools in chemistry has accumulated more than 15 years of history already. Powered by the advances in the Internet technologies, the current generation of web systems are starting to expand into areas, traditional for desktop applications. The web platforms integrate data storage, cheminformatics and data analysis tools. The ease of use and the collaborative potential of the web is compelling, despite the challenges. The topic of this review is a set of recently published web tools that facilitate predictive toxicology model building. The focus is on software platforms, offering web access to chemical structure-based methods, although some of the frameworks could also provide bioinformatics or hybrid data analysis functionalities. A number of historical and current developments are cited. In order to provide comparable assessment, the following characteristics are considered: support for workflows, descriptor calculations, visualization, modeling algorithms, data management and data sharing capabilities, availability of GUI or programmatic access and implementation details. The success of the Web is largely due to its highly decentralized, yet sufficiently interoperable model for information access. The expected future convergence between cheminformatics and bioinformatics databases provides new challenges toward management and analysis of large data sets. The web tools in predictive toxicology will likely continue to evolve toward the right mix of flexibility, performance, scalability, interoperability, sets of unique features offered, friendly user interfaces, programmatic access for advanced users, platform independence, results reproducibility, curation and crowdsourcing utilities, collaborative sharing and secure access.

  8. Bioinformatic Prediction of Gene Functions Regulated by Quorum Sensing in the Bioleaching Bacterium Acidithiobacillus ferrooxidans

    Directory of Open Access Journals (Sweden)

    Alvaro Banderas

    2013-08-01

    Full Text Available The biomining bacterium Acidithiobacillus ferrooxidans oxidizes sulfide ores and promotes metal solubilization. The efficiency of this process depends on the attachment of cells to surfaces, a process regulated by quorum sensing (QS cell-to-cell signalling in many Gram-negative bacteria. At. ferrooxidans has a functional QS system and the presence of AHLs enhances its attachment to pyrite. However, direct targets of the QS transcription factor AfeR remain unknown. In this study, a bioinformatic approach was used to infer possible AfeR direct targets based on the particular palindromic features of the AfeR binding site. A set of Hidden Markov Models designed to maintain palindromic regions and vary non-palindromic regions was used to screen for putative binding sites. By annotating the context of each predicted binding site (PBS, we classified them according to their positional coherence relative to other putative genomic structures such as start codons, RNA polymerase promoter elements and intergenic regions. We further used the Multiple EM for Motif Elicitation algorithm (MEME to further filter out low homology PBSs. In summary, 75 target-genes were identified, 34 of which have a higher confidence level. Among the identified genes, we found afeR itself, zwf, genes encoding glycosyltransferase activities, metallo-beta lactamases, and active transport-related proteins. Glycosyltransferases and Zwf (Glucose 6-phosphate-1-dehydrogenase might be directly involved in polysaccharide biosynthesis and attachment to minerals by At. ferrooxidans cells during the bioleaching process.

  9. Bioinformatic Prediction of Gene Functions Regulated by Quorum Sensing in the Bioleaching Bacterium Acidithiobacillus ferrooxidans

    Science.gov (United States)

    Banderas, Alvaro; Guiliani, Nicolas

    2013-01-01

    The biomining bacterium Acidithiobacillus ferrooxidans oxidizes sulfide ores and promotes metal solubilization. The efficiency of this process depends on the attachment of cells to surfaces, a process regulated by quorum sensing (QS) cell-to-cell signalling in many Gram-negative bacteria. At. ferrooxidans has a functional QS system and the presence of AHLs enhances its attachment to pyrite. However, direct targets of the QS transcription factor AfeR remain unknown. In this study, a bioinformatic approach was used to infer possible AfeR direct targets based on the particular palindromic features of the AfeR binding site. A set of Hidden Markov Models designed to maintain palindromic regions and vary non-palindromic regions was used to screen for putative binding sites. By annotating the context of each predicted binding site (PBS), we classified them according to their positional coherence relative to other putative genomic structures such as start codons, RNA polymerase promoter elements and intergenic regions. We further used the Multiple EM for Motif Elicitation algorithm (MEME) to further filter out low homology PBSs. In summary, 75 target-genes were identified, 34 of which have a higher confidence level. Among the identified genes, we found afeR itself, zwf, genes encoding glycosyltransferase activities, metallo-beta lactamases, and active transport-related proteins. Glycosyltransferases and Zwf (Glucose 6-phosphate-1-dehydrogenase) might be directly involved in polysaccharide biosynthesis and attachment to minerals by At. ferrooxidans cells during the bioleaching process. PMID:23959118

  10. Calibration of Multiple In Silico Tools for Predicting Pathogenicity of Mismatch Repair Gene Missense Substitutions

    Science.gov (United States)

    Thompson, Bryony A.; Greenblatt, Marc S.; Vallee, Maxime P.; Herkert, Johanna C.; Tessereau, Chloe; Young, Erin L.; Adzhubey, Ivan A.; Li, Biao; Bell, Russell; Feng, Bingjian; Mooney, Sean D.; Radivojac, Predrag; Sunyaev, Shamil R.; Frebourg, Thierry; Hofstra, Robert M.W.; Sijmons, Rolf H.; Boucher, Ken; Thomas, Alun; Goldgar, David E.; Spurdle, Amanda B.; Tavtigian, Sean V.

    2015-01-01

    Classification of rare missense substitutions observed during genetic testing for patient management is a considerable problem in clinical genetics. The Bayesian integrated evaluation of unclassified variants is a solution originally developed for BRCA1/2. Here, we take a step toward an analogous system for the mismatch repair (MMR) genes (MLH1, MSH2, MSH6, and PMS2) that confer colon cancer susceptibility in Lynch syndrome by calibrating in silico tools to estimate prior probabilities of pathogenicity for MMR gene missense substitutions. A qualitative five-class classification system was developed and applied to 143 MMR missense variants. This identified 74 missense substitutions suitable for calibration. These substitutions were scored using six different in silico tools (Align-Grantham Variation Grantham Deviation, multivariate analysis of protein polymorphisms [MAPP], Mut-Pred, PolyPhen-2.1, Sorting Intolerant From Tolerant, and Xvar), using curated MMR multiple sequence alignments where possible. The output from each tool was calibrated by regression against the classifications of the 74 missense substitutions; these calibrated outputs are interpretable as prior probabilities of pathogenicity. MAPP was the most accurate tool and MAPP + PolyPhen-2.1 provided the best-combined model (R2 = 0.62 and area under receiver operating characteristic = 0.93). The MAPP + PolyPhen-2.1 output is sufficiently predictive to feed as a continuous variable into the quantitative Bayesian integrated evaluation for clinical classification of MMR gene missense substitutions. PMID:22949387

  11. Predictions of models for environmental radiological assessment

    International Nuclear Information System (INIS)

    Peres, Sueli da Silva; Lauria, Dejanira da Costa; Mahler, Claudio Fernando

    2011-01-01

    In the field of environmental impact assessment, models are used for estimating source term, environmental dispersion and transfer of radionuclides, exposure pathway, radiation dose and the risk for human beings Although it is recognized that the specific information of local data are important to improve the quality of the dose assessment results, in fact obtaining it can be very difficult and expensive. Sources of uncertainties are numerous, among which we can cite: the subjectivity of modelers, exposure scenarios and pathways, used codes and general parameters. The various models available utilize different mathematical approaches with different complexities that can result in different predictions. Thus, for the same inputs different models can produce very different outputs. This paper presents briefly the main advances in the field of environmental radiological assessment that aim to improve the reliability of the models used in the assessment of environmental radiological impact. The intercomparison exercise of model supplied incompatible results for 137 Cs and 60 Co, enhancing the need for developing reference methodologies for environmental radiological assessment that allow to confront dose estimations in a common comparison base. The results of the intercomparison exercise are present briefly. (author)

  12. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans.

    Science.gov (United States)

    Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne; Bourgeois, Stephane; Svensson, Peter J; Wadelius, Mia; Deloukas, Panos; Montgomery, Stephen B; Altman, Russ B

    2017-11-24

    Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects. Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.

  13. Prediction of highly expressed genes in microbes based on chromatin accessibility

    DEFF Research Database (Denmark)

    Willenbrock, Hanni; Ussery, David

    2007-01-01

    BACKGROUND: It is well known that gene expression is dependent on chromatin structure in eukaryotes and it is likely that chromatin can play a role in bacterial gene expression as well. Here, we use a nucleosomal position preference measure of anisotropic DNA flexibility to predict highly expressed...

  14. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans

    Directory of Open Access Journals (Sweden)

    Assaf Gottlieb

    2017-11-01

    Full Text Available Abstract Background Genome-wide association studies are useful for discovering genotype–phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into “gene level” effects. Methods Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression—on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. Results We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Conclusions Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort

  15. A Predictive Maintenance Model for Railway Tracks

    DEFF Research Database (Denmark)

    Li, Rui; Wen, Min; Salling, Kim Bang

    2015-01-01

    presents a mathematical model based on Mixed Integer Programming (MIP) which is designed to optimize the predictive railway tamping activities for ballasted track for the time horizon up to four years. The objective function is setup to minimize the actual costs for the tamping machine (measured by time......). Five technical and economic aspects are taken into account to schedule tamping: (1) track degradation of the standard deviation of the longitudinal level over time; (2) track geometrical alignment; (3) track quality thresholds based on the train speed limits; (4) the dependency of the track quality...

  16. Predictive Capability Maturity Model for computational modeling and simulation.

    Energy Technology Data Exchange (ETDEWEB)

    Oberkampf, William Louis; Trucano, Timothy Guy; Pilch, Martin M.

    2007-10-01

    The Predictive Capability Maturity Model (PCMM) is a new model that can be used to assess the level of maturity of computational modeling and simulation (M&S) efforts. The development of the model is based on both the authors experience and their analysis of similar investigations in the past. The perspective taken in this report is one of judging the usefulness of a predictive capability that relies on the numerical solution to partial differential equations to better inform and improve decision making. The review of past investigations, such as the Software Engineering Institute's Capability Maturity Model Integration and the National Aeronautics and Space Administration and Department of Defense Technology Readiness Levels, indicates that a more restricted, more interpretable method is needed to assess the maturity of an M&S effort. The PCMM addresses six contributing elements to M&S: (1) representation and geometric fidelity, (2) physics and material model fidelity, (3) code verification, (4) solution verification, (5) model validation, and (6) uncertainty quantification and sensitivity analysis. For each of these elements, attributes are identified that characterize four increasing levels of maturity. Importantly, the PCMM is a structured method for assessing the maturity of an M&S effort that is directed toward an engineering application of interest. The PCMM does not assess whether the M&S effort, the accuracy of the predictions, or the performance of the engineering system satisfies or does not satisfy specified application requirements.

  17. Effective modelling for predictive analytics in data science ...

    African Journals Online (AJOL)

    Effective modelling for predictive analytics in data science. ... the nearabsence of empirical or factual predictive analytics in the mainstream research going on ... Keywords: Predictive Analytics, Big Data, Business Intelligence, Project Planning.

  18. Combining GPS measurements and IRI model predictions

    International Nuclear Information System (INIS)

    Hernandez-Pajares, M.; Juan, J.M.; Sanz, J.; Bilitza, D.

    2002-01-01

    The free electrons distributed in the ionosphere (between one hundred and thousands of km in height) produce a frequency-dependent effect on Global Positioning System (GPS) signals: a delay in the pseudo-orange and an advance in the carrier phase. These effects are proportional to the columnar electron density between the satellite and receiver, i.e. the integrated electron density along the ray path. Global ionospheric TEC (total electron content) maps can be obtained with GPS data from a network of ground IGS (international GPS service) reference stations with an accuracy of few TEC units. The comparison with the TOPEX TEC, mainly measured over the oceans far from the IGS stations, shows a mean bias and standard deviation of about 2 and 5 TECUs respectively. The discrepancies between the STEC predictions and the observed values show an RMS typically below 5 TECUs (which also includes the alignment code noise). he existence of a growing database 2-hourly global TEC maps and with resolution of 5x2.5 degrees in longitude and latitude can be used to improve the IRI prediction capability of the TEC. When the IRI predictions and the GPS estimations are compared for a three month period around the Solar Maximum, they are in good agreement for middle latitudes. An over-determination of IRI TEC has been found at the extreme latitudes, the IRI predictions being, typically two times higher than the GPS estimations. Finally, local fits of the IRI model can be done by tuning the SSN from STEC GPS observations

  19. Effect on Prediction when Modeling Covariates in Bayesian Nonparametric Models.

    Science.gov (United States)

    Cruz-Marcelo, Alejandro; Rosner, Gary L; Müller, Peter; Stewart, Clinton F

    2013-04-01

    In biomedical research, it is often of interest to characterize biologic processes giving rise to observations and to make predictions of future observations. Bayesian nonparametric methods provide a means for carrying out Bayesian inference making as few assumptions about restrictive parametric models as possible. There are several proposals in the literature for extending Bayesian nonparametric models to include dependence on covariates. Limited attention, however, has been directed to the following two aspects. In this article, we examine the effect on fitting and predictive performance of incorporating covariates in a class of Bayesian nonparametric models by one of two primary ways: either in the weights or in the locations of a discrete random probability measure. We show that different strategies for incorporating continuous covariates in Bayesian nonparametric models can result in big differences when used for prediction, even though they lead to otherwise similar posterior inferences. When one needs the predictive density, as in optimal design, and this density is a mixture, it is better to make the weights depend on the covariates. We demonstrate these points via a simulated data example and in an application in which one wants to determine the optimal dose of an anticancer drug used in pediatric oncology.

  20. Mathematical models for indoor radon prediction

    International Nuclear Information System (INIS)

    Malanca, A.; Pessina, V.; Dallara, G.

    1995-01-01

    It is known that the indoor radon (Rn) concentration can be predicted by means of mathematical models. The simplest model relies on two variables only: the Rn source strength and the air exchange rate. In the Lawrence Berkeley Laboratory (LBL) model several environmental parameters are combined into a complex equation; besides, a correlation between the ventilation rate and the Rn entry rate from the soil is admitted. The measurements were carried out using activated carbon canisters. Seventy-five measurements of Rn concentrations were made inside two rooms placed on the second floor of a building block. One of the rooms had a single-glazed window whereas the other room had a double pane window. During three different experimental protocols, the mean Rn concentration was always higher into the room with a double-glazed window. That behavior can be accounted for by the simplest model. A further set of 450 Rn measurements was collected inside a ground-floor room with a grounding well in it. This trend maybe accounted for by the LBL model

  1. Towards predictive models for transitionally rough surfaces

    Science.gov (United States)

    Abderrahaman-Elena, Nabil; Garcia-Mayoral, Ricardo

    2017-11-01

    We analyze and model the previously presented decomposition for flow variables in DNS of turbulence over transitionally rough surfaces. The flow is decomposed into two contributions: one produced by the overlying turbulence, which has no footprint of the surface texture, and one induced by the roughness, which is essentially the time-averaged flow around the surface obstacles, but modulated in amplitude by the first component. The roughness-induced component closely resembles the laminar steady flow around the roughness elements at the same non-dimensional roughness size. For small - yet transitionally rough - textures, the roughness-free component is essentially the same as over a smooth wall. Based on these findings, we propose predictive models for the onset of the transitionally rough regime. Project supported by the Engineering and Physical Sciences Research Council (EPSRC).

  2. Genome-wide targeted prediction of ABA responsive genes in rice based on over-represented cis-motif in co-expressed genes.

    Science.gov (United States)

    Lenka, Sangram K; Lohia, Bikash; Kumar, Abhay; Chinnusamy, Viswanathan; Bansal, Kailash C

    2009-02-01

    Abscisic acid (ABA), the popular plant stress hormone, plays a key role in regulation of sub-set of stress responsive genes. These genes respond to ABA through specific transcription factors which bind to cis-regulatory elements present in their promoters. We discovered the ABA Responsive Element (ABRE) core (ACGT) containing CGMCACGTGB motif as over-represented motif among the promoters of ABA responsive co-expressed genes in rice. Targeted gene prediction strategy using this motif led to the identification of 402 protein coding genes potentially regulated by ABA-dependent molecular genetic network. RT-PCR analysis of arbitrarily chosen 45 genes from the predicted 402 genes confirmed 80% accuracy of our prediction. Plant Gene Ontology (GO) analysis of ABA responsive genes showed enrichment of signal transduction and stress related genes among diverse functional categories.

  3. Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.

    Science.gov (United States)

    Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción

    2016-02-27

    In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a

  4. Resource-estimation models and predicted discovery

    International Nuclear Information System (INIS)

    Hill, G.W.

    1982-01-01

    Resources have been estimated by predictive extrapolation from past discovery experience, by analogy with better explored regions, or by inference from evidence of depletion of targets for exploration. Changes in technology and new insights into geological mechanisms have occurred sufficiently often in the long run to form part of the pattern of mature discovery experience. The criterion, that a meaningful resource estimate needs an objective measure of its precision or degree of uncertainty, excludes 'estimates' based solely on expert opinion. This is illustrated by development of error measures for several persuasive models of discovery and production of oil and gas in USA, both annually and in terms of increasing exploration effort. Appropriate generalizations of the models resolve many points of controversy. This is illustrated using two USA data sets describing discovery of oil and of U 3 O 8 ; the latter set highlights an inadequacy of available official data. Review of the oil-discovery data set provides a warrant for adjusting the time-series prediction to a higher resource figure for USA petroleum. (author)

  5. Predictive modeling of gingivitis severity and susceptibility via oral microbiota.

    Science.gov (United States)

    Huang, Shi; Li, Rui; Zeng, Xiaowei; He, Tao; Zhao, Helen; Chang, Alice; Bo, Cunpei; Chen, Jie; Yang, Fang; Knight, Rob; Liu, Jiquan; Davis, Catherine; Xu, Jian

    2014-09-01

    Predictive modeling of human disease based on the microbiota holds great potential yet remains challenging. Here, 50 adults underwent controlled transitions from naturally occurring gingivitis, to healthy gingivae (baseline), and to experimental gingivitis (EG). In diseased plaque microbiota, 27 bacterial genera changed in relative abundance and functional genes including 33 flagellar biosynthesis-related groups were enriched. Plaque microbiota structure exhibited a continuous gradient along the first principal component, reflecting transition from healthy to diseased states, which correlated with Mazza Gingival Index. We identified two host types with distinct gingivitis sensitivity. Our proposed microbial indices of gingivitis classified host types with 74% reliability, and, when tested on another 41-member cohort, distinguished healthy from diseased individuals with 95% accuracy. Furthermore, the state of the microbiota in naturally occurring gingivitis predicted the microbiota state and severity of subsequent EG (but not the state of the microbiota during the healthy baseline period). Because the effect of disease is greater than interpersonal variation in plaque, in contrast to the gut, plaque microbiota may provide advantages in predictive modeling of oral diseases.

  6. Prediction of pipeline corrosion rate based on grey Markov models

    International Nuclear Information System (INIS)

    Chen Yonghong; Zhang Dafa; Peng Guichu; Wang Yuemin

    2009-01-01

    Based on the model that combined by grey model and Markov model, the prediction of corrosion rate of nuclear power pipeline was studied. Works were done to improve the grey model, and the optimization unbiased grey model was obtained. This new model was used to predict the tendency of corrosion rate, and the Markov model was used to predict the residual errors. In order to improve the prediction precision, rolling operation method was used in these prediction processes. The results indicate that the improvement to the grey model is effective and the prediction precision of the new model combined by the optimization unbiased grey model and Markov model is better, and the use of rolling operation method may improve the prediction precision further. (authors)

  7. Concerted down-regulation of immune-system related genes predicts metastasis in colorectal carcinoma

    International Nuclear Information System (INIS)

    Fehlker, Marion; Huska, Matthew R; Jöns, Thomas; Andrade-Navarro, Miguel A; Kemmner, Wolfgang

    2014-01-01

    This study aimed at the identification of prognostic gene expression markers in early primary colorectal carcinomas without metastasis at the time point of surgery by analyzing genome-wide gene expression profiles using oligonucleotide microarrays. Cryo-conserved tumor specimens from 45 patients with early colorectal cancers were examined, with the majority of them being UICC stage II or earlier and with a follow-up time of 41–115 months. Gene expression profiling was performed using Whole Human Genome 4x44K Oligonucleotide Microarrays. Validation of microarray data was performed on five of the genes in a smaller cohort. Using a novel algorithm based on the recursive application of support vector machines (SVMs), we selected a signature of 44 probes that discriminated between patients developing later metastasis and patients with a good prognosis. Interestingly, almost half of the genes was related to the patients’ immune response and showed reduced expression in the metastatic cases. Whereas up to now gene signatures containing genes with various biological functions have been described for prediction of metastasis in CRC, in this study metastasis could be well predicted by a set of gene expression markers consisting exclusively of genes related to the MHC class II complex involved in immune response. Thus, our data emphasize that the proper function of a comprehensive network of immune response genes is of vital importance for the survival of colorectal cancer patients

  8. PRGPred: A platform for prediction of domains of resistance gene analogue (RGA in Arecaceae developed using machine learning algorithms

    Directory of Open Access Journals (Sweden)

    MATHODIYIL S. MANJULA

    2015-12-01

    Full Text Available Plant disease resistance genes (R-genes are responsible for initiation of defense mechanism against various phytopathogens. The majority of plant R-genes are members of very large multi-gene families, which encode structurally related proteins containing nucleotide binding site domains (NBS and C-terminal leucine rich repeats (LRR. Other classes possess' an extracellular LRR domain, a transmembrane domain and sometimes, an intracellular serine/threonine kinase domain. R-proteins work in pathogen perception and/or the activation of conserved defense signaling networks. In the present study, sequences representing resistance gene analogues (RGAs of coconut, arecanut, oil palm and date palm were collected from NCBI, sorted based on domains and assembled into a database. The sequences were analyzed in PRINTS database to find out the conserved domains and their motifs present in the RGAs. Based on these domains, we have also developed a tool to predict the domains of palm R-genes using various machine learning algorithms. The model files were selected based on the performance of the best classifier in training and testing. All these information is stored and made available in the online ‘PRGpred' database and prediction tool.

  9. An Operational Model for the Prediction of Jet Blast

    Science.gov (United States)

    2012-01-09

    This paper presents an operational model for the prediction of jet blast. The model was : developed based upon three modules including a jet exhaust model, jet centerline decay : model and aircraft motion model. The final analysis was compared with d...

  10. Pharmacokinetic–pharmacodynamic modelling of drug‐induced QTc interval prolongation in man: prediction from in vitro human ether‐à‐go‐go‐related gene binding and functional inhibition assays and conscious dog studies

    Science.gov (United States)

    Dubois, V F S; Casarotto, E; Danhof, M

    2016-01-01

    Background and Purpose Functional measures of human ether‐à‐go‐go‐related gene (hERG; Kv11.1) channel inhibition have been prioritized as an in vitro screening tool for candidate molecules. However, it is unclear how these results can be translated to humans. Here, we explore how data on drug binding and functional inhibition in vitro relate to QT prolongation in vivo. Using cisapride, sotalol and moxifloxacin as paradigm compounds, we assessed the relationship between drug concentrations, binding, functional measures and in vivo effects in preclinical species and humans. Experimental Approach Pharmacokinetic–pharmacodynamic modelling was used to characterize the drug effects in hERG functional patch clamp, hERG radio‐labelled dofetilide displacement experiments and QT interval in conscious dogs. Data were analysed in parallel to identify potential correlations between pharmacological activity in vitro and in vivo. Key Results An Emax model could not be used due to large variability in the functional patch clamp assay. Dofetilide displacement revealed that binding curves are unrelated to the in vivo potency estimates for QTc prolongation in dogs and humans. Mean in vitro potency estimates ranged from 99.9 nM for cisapride to 1030 μM for moxifloxacin. Conclusions and Implications The lack of standardized protocols for in vitro assays leads to significant differences in experimental conditions, making the assessment of in vitro–in vivo correlations unreliable. Identification of an accurate safety window during the screening of candidate molecules requires a quantitative framework that disentangles system‐ from drug‐specific properties under physiological conditions, enabling translation of the results to humans. Similar considerations will be relevant for the comprehensive in vitro pro‐arrhythmia assay initiative. PMID:27427789

  11. Genomic Features That Predict Allelic Imbalance in Humans Suggest Patterns of Constraint on Gene Expression Variation

    Science.gov (United States)

    Fédrigo, Olivier; Haygood, Ralph; Mukherjee, Sayan; Wray, Gregory A.

    2009-01-01

    Variation in gene expression is an important contributor to phenotypic diversity within and between species. Although this variation often has a genetic component, identification of the genetic variants driving this relationship remains challenging. In particular, measurements of gene expression usually do not reveal whether the genetic basis for any observed variation lies in cis or in trans to the gene, a distinction that has direct relevance to the physical location of the underlying genetic variant, and which may also impact its evolutionary trajectory. Allelic imbalance measurements identify cis-acting genetic effects by assaying the relative contribution of the two alleles of a cis-regulatory region to gene expression within individuals. Identification of patterns that predict commonly imbalanced genes could therefore serve as a useful tool and also shed light on the evolution of cis-regulatory variation itself. Here, we show that sequence motifs, polymorphism levels, and divergence levels around a gene can be used to predict commonly imbalanced genes in a human data set. Reduction of this feature set to four factors revealed that only one factor significantly differentiated between commonly imbalanced and nonimbalanced genes. We demonstrate that these results are consistent between the original data set and a second published data set in humans obtained using different technical and statistical methods. Finally, we show that variation in the single allelic imbalance-associated factor is partially explained by the density of genes in the region of a target gene (allelic imbalance is less probable for genes in gene-dense regions), and, to a lesser extent, the evenness of expression of the gene across tissues and the magnitude of negative selection on putative regulatory regions of the gene. These results suggest that the genomic distribution of functional cis-regulatory variants in the human genome is nonrandom, perhaps due to local differences in evolutionary

  12. Prediction of highly expressed genes in microbes based on chromatin accessibility

    Directory of Open Access Journals (Sweden)

    Ussery David W

    2007-02-01

    Full Text Available Abstract Background It is well known that gene expression is dependent on chromatin structure in eukaryotes and it is likely that chromatin can play a role in bacterial gene expression as well. Here, we use a nucleosomal position preference measure of anisotropic DNA flexibility to predict highly expressed genes in microbial genomes. We compare these predictions with those based on codon adaptation index (CAI values, and also with experimental data for 6 different microbial genomes, with a particular interest in experimental data from Escherichia coli. Moreover, position preference is examined further in 328 sequenced microbial genomes. Results We find that absolute gene expression levels are correlated with the position preference in many microbial genomes. It is postulated that in these regions, the DNA may be more accessible to the transcriptional machinery. Moreover, ribosomal proteins and ribosomal RNA are encoded by DNA having significantly lower position preference values than other genes in fast-replicating microbes. Conclusion This insight into DNA structure-dependent gene expression in microbes may be exploited for predicting the expression of non-translated genes such as non-coding RNAs that may not be predicted by any of the conventional codon usage bias approaches.

  13. Accurate prediction of secondary metabolite gene clusters in filamentous fungi

    DEFF Research Database (Denmark)

    Andersen, Mikael Rørdam; Nielsen, Jakob Blæsbjerg; Klitgaard, Andreas

    2013-01-01

    Biosynthetic pathways of secondary metabolites from fungi are currently subject to an intense effort to elucidate the genetic basis for these compounds due to their large potential within pharmaceutics and synthetic biochemistry. The preferred method is methodical gene deletions to identify...... used A. nidulans for our method development and validation due to the wealth of available biochemical data, but the method can be applied to any fungus with a sequenced and assembled genome, thus supporting further secondary metabolite pathway elucidation in the fungal kingdom....

  14. Gene prediction and RFX transcriptional regulation analysis using comparative genomics

    OpenAIRE

    Chu, Jeffrey Shih Chieh

    2011-01-01

    Regulatory Factor X (RFX) is a family of transcription factors (TF) that is conserved in all metazoans, in some fungi, and in only a few single-cellular organisms. Seven members are found in mammals, nine in fishes, three in fruit flies, and a single member in nematodes and fungi. RFX is involved in many different roles in humans, but a particular function that is conserved in many metazoans is its regulation of ciliogenesis. Probing over 150 genomes for the presence of RFX and ciliary genes ...

  15. Mining predicted essential genes of Brugia malayi for nematode drug targets.

    Directory of Open Access Journals (Sweden)

    Sanjay Kumar

    Full Text Available We report results from the first genome-wide application of a rational drug target selection methodology to a metazoan pathogen genome, the completed draft sequence of Brugia malayi, a parasitic nematode responsible for human lymphatic filariasis. More than 1.5 billion people worldwide are at risk of contracting lymphatic filariasis and onchocerciasis, a related filarial disease. Drug treatments for filariasis have not changed significantly in over 20 years, and with the risk of resistance rising, there is an urgent need for the development of new anti-filarial drug therapies. The recent publication of the draft genomic sequence for B. malayi enables a genome-wide search for new drug targets. However, there is no functional genomics data in B. malayi to guide the selection of potential drug targets. To circumvent this problem, we have utilized the free-living model nematode Caenorhabditis elegans as a surrogate for B. malayi. Sequence comparisons between the two genomes allow us to map C. elegans orthologs to B. malayi genes. Using these orthology mappings and by incorporating the extensive genomic and functional genomic data, including genome-wide RNAi screens, that already exist for C. elegans, we identify potentially essential genes in B. malayi. Further incorporation of human host genome sequence data and a custom algorithm for prioritization enables us to collect and rank nearly 600 drug target candidates. Previously identified potential drug targets cluster near the top of our prioritized list, lending credibility to our methodology. Over-represented Gene Ontology terms, predicted InterPro domains, and RNAi phenotypes of C. elegans orthologs associated with the potential target pool are identified. By virtue of the selection procedure, the potential B. malayi drug targets highlight components of key processes in nematode biology such as central metabolism, molting and regulation of gene expression.

  16. Data driven propulsion system weight prediction model

    Science.gov (United States)

    Gerth, Richard J.

    1994-10-01

    The objective of the research was to develop a method to predict the weight of paper engines, i.e., engines that are in the early stages of development. The impetus for the project was the Single Stage To Orbit (SSTO) project, where engineers need to evaluate alternative engine designs. Since the SSTO is a performance driven project the performance models for alternative designs were well understood. The next tradeoff is weight. Since it is known that engine weight varies with thrust levels, a model is required that would allow discrimination between engines that produce the same thrust. Above all, the model had to be rooted in data with assumptions that could be justified based on the data. The general approach was to collect data on as many existing engines as possible and build a statistical model of the engines weight as a function of various component performance parameters. This was considered a reasonable level to begin the project because the data would be readily available, and it would be at the level of most paper engines, prior to detailed component design.

  17. Predictive modeling of emergency cesarean delivery.

    Directory of Open Access Journals (Sweden)

    Carlos Campillo-Artero

    Full Text Available To increase discriminatory accuracy (DA for emergency cesarean sections (ECSs.We prospectively collected data on and studied all 6,157 births occurring in 2014 at four public hospitals located in three different autonomous communities of Spain. To identify risk factors (RFs for ECS, we used likelihood ratios and logistic regression, fitted a classification tree (CTREE, and analyzed a random forest model (RFM. We used the areas under the receiver-operating-characteristic (ROC curves (AUCs to assess their DA.The magnitude of the LR+ for all putative individual RFs and ORs in the logistic regression models was low to moderate. Except for parity, all putative RFs were positively associated with ECS, including hospital fixed-effects and night-shift delivery. The DA of all logistic models ranged from 0.74 to 0.81. The most relevant RFs (pH, induction, and previous C-section in the CTREEs showed the highest ORs in the logistic models. The DA of the RFM and its most relevant interaction terms was even higher (AUC = 0.94; 95% CI: 0.93-0.95.Putative fetal, maternal, and contextual RFs alone fail to achieve reasonable DA for ECS. It is the combination of these RFs and the interactions between them at each hospital that make it possible to improve the DA for the type of delivery and tailor interventions through prediction to improve the appropriateness of ECS indications.

  18. Model Predictive Control based on Finite Impulse Response Models

    DEFF Research Database (Denmark)

    Prasath, Guru; Jørgensen, John Bagterp

    2008-01-01

    We develop a regularized l2 finite impulse response (FIR) predictive controller with input and input-rate constraints. Feedback is based on a simple constant output disturbance filter. The performance of the predictive controller in the face of plant-model mismatch is investigated by simulations...... and related to the uncertainty of the impulse response coefficients. The simulations can be used to benchmark l2 MPC against FIR based robust MPC as well as to estimate the maximum performance improvements by robust MPC....

  19. Pathway analysis of gene signatures predicting metastasis of node-negative primary breast cancer

    International Nuclear Information System (INIS)

    Yu, Jack X; Sieuwerts, Anieta M; Zhang, Yi; Martens, John WM; Smid, Marcel; Klijn, Jan GM; Wang, Yixin; Foekens, John A

    2007-01-01

    Published prognostic gene signatures in breast cancer have few genes in common. Here we provide a rationale for this observation by studying the prognostic power and the underlying biological pathways of different gene signatures. Gene signatures to predict the development of metastases in estrogen receptor-positive and estrogen receptor-negative tumors were identified using 500 re-sampled training sets and mapping to Gene Ontology Biological Process to identify over-represented pathways. The Global Test program confirmed that gene expression profilings in the common pathways were associated with the metastasis of the patients. The apoptotic pathway and cell division, or cell growth regulation and G-protein coupled receptor signal transduction, were most significantly associated with the metastatic capability of estrogen receptor-positive or estrogen-negative tumors, respectively. A gene signature derived of the common pathways predicted metastasis in an independent cohort. Mapping of the pathways represented by different published prognostic signatures showed that they share 53% of the identified pathways. We show that divergent gene sets classifying patients for the same clinical endpoint represent similar biological processes and that pathway-derived signatures can be used to predict prognosis. Furthermore, our study reveals that the underlying biology related to aggressiveness of estrogen receptor subgroups of breast cancer is quite different

  20. Oxytocin receptor gene variation predicts subjective responses to MDMA.

    Science.gov (United States)

    Bershad, Anya K; Weafer, Jessica J; Kirkpatrick, Matthew G; Wardle, Margaret C; Miller, Melissa A; de Wit, Harriet

    2016-12-01

    3,4-Methylenedioxymethamphetamine (MDMA, "ecstasy") enhances desire to socialize and feelings of empathy, which are thought to be related to increased oxytocin levels. Thus, variation in the oxytocin receptor gene (OXTR) may influence responses to the drug. Here, we examined the influence of a single OXTR nucleotide polymorphism (SNP) on responses to MDMA in humans. Based on findings that carriers of the A allele at rs53576 exhibit reduced sensitivity to oxytocin-induced social behavior, we hypothesized that these individuals would show reduced subjective responses to MDMA, including sociability. In this three-session, double blind, within-subjects study, healthy volunteers with past MDMA experience (N = 68) received a MDMA (0, 0.75 mg/kg, and 1.5 mg/kg) and provided self-report ratings of sociability, anxiety, and drug effects. These responses were examined in relation to rs53576. MDMA (1.5 mg/kg) did not increase sociability in individuals with the A/A genotype as it did in G allele carriers. The genotypic groups did not differ in responses at the lower MDMA dose, or in cardiovascular or other subjective responses. These findings are consistent with the idea that MDMA-induced sociability is mediated by oxytocin, and that variation in the oxytocin receptor gene may influence responses to the drug.

  1. Construction of a novel multi-gene assay (42-gene classifier) for prediction of late recurrence in ER-positive breast cancer patients.

    Science.gov (United States)

    Tsunashima, Ryo; Naoi, Yasuto; Shimazu, Kenzo; Kagara, Naofumi; Shimoda, Masashi; Tanei, Tomonori; Miyake, Tomohiro; Kim, Seung Jin; Noguchi, Shinzaburo

    2018-05-04

    Prediction models for late (> 5 years) recurrence in ER-positive breast cancer need to be developed for the accurate selection of patients for extended hormonal therapy. We attempted to develop such a prediction model focusing on the differences in gene expression between breast cancers with early and late recurrence. For the training set, 779 ER-positive breast cancers treated with tamoxifen alone for 5 years were selected from the databases (GSE6532, GSE12093, GSE17705, and GSE26971). For the validation set, 221 ER-positive breast cancers treated with adjuvant hormonal therapy for 5 years with or without chemotherapy at our hospital were included. Gene expression was assayed by DNA microarray analysis (Affymetrix U133 plus 2.0). With the 42 genes differentially expressed in early and late recurrence breast cancers in the training set, a prediction model (42GC) for late recurrence was constructed. The patients classified by 42GC into the late recurrence-like group showed a significantly (P = 0.006) higher late recurrence rate as expected but a significantly (P = 1.62 × E-13) lower rate for early recurrence than non-late recurrence-like group. These observations were confirmed for the validation set, i.e., P = 0.020 for late recurrence and P = 5.70 × E-5 for early recurrence. We developed a unique prediction model (42GC) for late recurrence by focusing on the biological differences between breast cancers with early and late recurrence. Interestingly, patients in the late recurrence-like group by 42GC were at low risk for early recurrence.

  2. Functional validation of candidate genes detected by genomic feature models

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Østergaard, Solveig; Kristensen, Torsten Nygaard

    2018-01-01

    Understanding the genetic underpinnings of complex traits requires knowledge of the genetic variants that contribute to phenotypic variability. Reliable statistical approaches are needed to obtain such knowledge. In genome-wide association studies, variants are tested for association with trait...... then functionally assessed whether the identified candidate genes affected locomotor activity by reducing gene expression using RNA interference. In five of the seven candidate genes tested, reduced gene expression altered the phenotype. The ranking of genes within the predictive GO term was highly correlated...

  3. Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.

    Directory of Open Access Journals (Sweden)

    Thomas H A Ederveen

    Full Text Available Nowadays, prokaryotic genomes are sequenced faster than the capacity to manually curate gene annotations. Automated genome annotation engines provide users a straight-forward and complete solution for predicting ORF coordinates and function. For many labs, the use of AGEs is therefore essential to decrease the time necessary for annotating a given prokaryotic genome. However, it is not uncommon for AGEs to provide different and sometimes conflicting predictions. Combining multiple AGEs might allow for more accurate predictions. Here we analyzed the ab initio open reading frame (ORF calling performance of different AGEs based on curated genome annotations of eight strains from different bacterial species with GC% ranging from 35-52%. We present a case study which demonstrates a novel way of comparative genome annotation, using combinations of AGEs in a pre-defined order (or path to predict ORF start codons. The order of AGE combinations is from high to low specificity, where the specificity is based on the eight genome annotations. For each AGE combination we are able to derive a so-called projected confidence value, which is the average specificity of ORF start codon prediction based on the eight genomes. The projected confidence enables estimating likeliness of a correct prediction for a particular ORF start codon by a particular AGE combination, pinpointing ORFs notoriously difficult to predict start codons. We correctly predict start codons for 90.5±4.8% of the genes in a genome (based on the eight genomes with an accuracy of 81.1±7.6%. Our consensus-path methodology allows a marked improvement over majority voting (9.7±4.4% and with an optimal path ORF start prediction sensitivity is gained while maintaining a high specificity.

  4. Methodology for Designing Models Predicting Success of Infertility Treatment

    OpenAIRE

    Alireza Zarinara; Mohammad Mahdi Akhondi; Hojjat Zeraati; Koorsh Kamali; Kazem Mohammad

    2016-01-01

    Abstract Background: The prediction models for infertility treatment success have presented since 25 years ago. There are scientific principles for designing and applying the prediction models that is also used to predict the success rate of infertility treatment. The purpose of this study is to provide basic principles for designing the model to predic infertility treatment success. Materials and Methods: In this paper, the principles for developing predictive models are explained and...

  5. Modeling insertional mutagenesis using gene length and expression in murine embryonic stem cells.

    Directory of Open Access Journals (Sweden)

    Alex S Nord

    2007-07-01

    Full Text Available High-throughput mutagenesis of the mammalian genome is a powerful means to facilitate analysis of gene function. Gene trapping in embryonic stem cells (ESCs is the most widely used form of insertional mutagenesis in mammals. However, the rules governing its efficiency are not fully understood, and the effects of vector design on the likelihood of gene-trapping events have not been tested on a genome-wide scale.In this study, we used public gene-trap data to model gene-trap likelihood. Using the association of gene length and gene expression with gene-trap likelihood, we constructed spline-based regression models that characterize which genes are susceptible and which genes are resistant to gene-trapping techniques. We report results for three classes of gene-trap vectors, showing that both length and expression are significant determinants of trap likelihood for all vectors. Using our models, we also quantitatively identified hotspots of gene-trap activity, which represent loci where the high likelihood of vector insertion is controlled by factors other than length and expression. These formalized statistical models describe a high proportion of the variance in the likelihood of a gene being trapped by expression-dependent vectors and a lower, but still significant, proportion of the variance for vectors that are predicted to be independent of endogenous gene expression.The findings of significant expression and length effects reported here further the understanding of the determinants of vector insertion. Results from this analysis can be applied to help identify other important determinants of this important biological phenomenon and could assist planning of large-scale mutagenesis efforts.

  6. Finite Unification: Theory, Models and Predictions

    CERN Document Server

    Heinemeyer, S; Zoupanos, G

    2011-01-01

    All-loop Finite Unified Theories (FUTs) are very interesting N=1 supersymmetric Grand Unified Theories (GUTs) realising an old field theory dream, and moreover have a remarkable predictive power due to the required reduction of couplings. The reduction of the dimensionless couplings in N=1 GUTs is achieved by searching for renormalization group invariant (RGI) relations among them holding beyond the unification scale. Finiteness results from the fact that there exist RGI relations among dimensional couplings that guarantee the vanishing of all beta-functions in certain N=1 GUTs even to all orders. Furthermore developments in the soft supersymmetry breaking sector of N=1 GUTs and FUTs lead to exact RGI relations, i.e. reduction of couplings, in this dimensionful sector of the theory, too. Based on the above theoretical framework phenomenologically consistent FUTs have been constructed. Here we review FUT models based on the SU(5) and SU(3)^3 gauge groups and their predictions. Of particular interest is the Hig...

  7. Revised predictive equations for salt intrusion modelling in estuaries

    NARCIS (Netherlands)

    Gisen, J.I.A.; Savenije, H.H.G.; Nijzink, R.C.

    2015-01-01

    For one-dimensional salt intrusion models to be predictive, we need predictive equations to link model parameters to observable hydraulic and geometric variables. The one-dimensional model of Savenije (1993b) made use of predictive equations for the Van der Burgh coefficient $K$ and the dispersion

  8. Predicting multi-level drug response with gene expression profile in multiple myeloma using hierarchical ordinal regression.

    Science.gov (United States)

    Zhang, Xinyan; Li, Bingzong; Han, Huiying; Song, Sha; Xu, Hongxia; Hong, Yating; Yi, Nengjun; Zhuang, Wenzhuo

    2018-05-10

    Multiple myeloma (MM), like other cancers, is caused by the accumulation of genetic abnormalities. Heterogeneity exists in the patients' response to treatments, for example, bortezomib. This urges efforts to identify biomarkers from numerous molecular features and build predictive models for identifying patients that can benefit from a certain treatment scheme. However, previous studies treated the multi-level ordinal drug response as a binary response where only responsive and non-responsive groups are considered. It is desirable to directly analyze the multi-level drug response, rather than combining the response to two groups. In this study, we present a novel method to identify significantly associated biomarkers and then develop ordinal genomic classifier using the hierarchical ordinal logistic model. The proposed hierarchical ordinal logistic model employs the heavy-tailed Cauchy prior on the coefficients and is fitted by an efficient quasi-Newton algorithm. We apply our hierarchical ordinal regression approach to analyze two publicly available datasets for MM with five-level drug response and numerous gene expression measures. Our results show that our method is able to identify genes associated with the multi-level drug response and to generate powerful predictive models for predicting the multi-level response. The proposed method allows us to jointly fit numerous correlated predictors and thus build efficient models for predicting the multi-level drug response. The predictive model for the multi-level drug response can be more informative than the previous approaches. Thus, the proposed approach provides a powerful tool for predicting multi-level drug response and has important impact on cancer studies.

  9. Neutrino nucleosynthesis in supernovae: Shell model predictions

    International Nuclear Information System (INIS)

    Haxton, W.C.

    1989-01-01

    Almost all of the 3 · 10 53 ergs liberated in a core collapse supernova is radiated as neutrinos by the cooling neutron star. I will argue that these neutrinos interact with nuclei in the ejected shells of the supernovae to produce new elements. It appears that this nucleosynthesis mechanism is responsible for the galactic abundances of 7 Li, 11 B, 19 F, 138 La, and 180 Ta, and contributes significantly to the abundances of about 15 other light nuclei. I discuss shell model predictions for the charged and neutral current allowed and first-forbidden responses of the parent nuclei, as well as the spallation processes that produce the new elements. 18 refs., 1 fig., 1 tab

  10. Hierarchical Model Predictive Control for Resource Distribution

    DEFF Research Database (Denmark)

    Bendtsen, Jan Dimon; Trangbæk, K; Stoustrup, Jakob

    2010-01-01

    units. The approach is inspired by smart-grid electric power production and consumption systems, where the flexibility of a large number of power producing and/or power consuming units can be exploited in a smart-grid solution. The objective is to accommodate the load variation on the grid, arising......This paper deals with hierarchichal model predictive control (MPC) of distributed systems. A three level hierachical approach is proposed, consisting of a high level MPC controller, a second level of so-called aggregators, controlled by an online MPC-like algorithm, and a lower level of autonomous...... on one hand from varying consumption, on the other hand by natural variations in power production e.g. from wind turbines. The approach presented is based on quadratic optimization and possess the properties of low algorithmic complexity and of scalability. In particular, the proposed design methodology...

  11. Distributed model predictive control made easy

    CERN Document Server

    Negenborn, Rudy

    2014-01-01

    The rapid evolution of computer science, communication, and information technology has enabled the application of control techniques to systems beyond the possibilities of control theory just a decade ago. Critical infrastructures such as electricity, water, traffic and intermodal transport networks are now in the scope of control engineers. The sheer size of such large-scale systems requires the adoption of advanced distributed control approaches. Distributed model predictive control (MPC) is one of the promising control methodologies for control of such systems.   This book provides a state-of-the-art overview of distributed MPC approaches, while at the same time making clear directions of research that deserve more attention. The core and rationale of 35 approaches are carefully explained. Moreover, detailed step-by-step algorithmic descriptions of each approach are provided. These features make the book a comprehensive guide both for those seeking an introduction to distributed MPC as well as for those ...

  12. Model predictive control of a wind turbine modelled in Simpack

    International Nuclear Information System (INIS)

    Jassmann, U; Matzke, D; Reiter, M; Abel, D; Berroth, J; Schelenz, R; Jacobs, G

    2014-01-01

    Wind turbines (WT) are steadily growing in size to increase their power production, which also causes increasing loads acting on the turbine's components. At the same time large structures, such as the blades and the tower get more flexible. To minimize this impact, the classical control loops for keeping the power production in an optimum state are more and more extended by load alleviation strategies. These additional control loops can be unified by a multiple-input multiple-output (MIMO) controller to achieve better balancing of tuning parameters. An example for MIMO control, which has been paid more attention to recently by wind industry, is Model Predictive Control (MPC). In a MPC framework a simplified model of the WT is used to predict its controlled outputs. Based on a user-defined cost function an online optimization calculates the optimal control sequence. Thereby MPC can intrinsically incorporate constraints e.g. of actuators. Turbine models used for calculation within the MPC are typically simplified. For testing and verification usually multi body simulations, such as FAST, BLADED or FLEX5 are used to model system dynamics, but they are still limited in the number of degrees of freedom (DOF). Detailed information about load distribution (e.g. inside the gearbox) cannot be provided by such models. In this paper a Model Predictive Controller is presented and tested in a co-simulation with SlMPACK, a multi body system (MBS) simulation framework used for detailed load analysis. The analysis are performed on the basis of the IME6.0 MBS WT model, described in this paper. It is based on the rotor of the NREL 5MW WT and consists of a detailed representation of the drive train. This takes into account a flexible main shaft and its main bearings with a planetary gearbox, where all components are modelled flexible, as well as a supporting flexible main frame. The wind loads are simulated using the NREL AERODYN v13 code which has been implemented as a routine

  13. Model predictive control of a wind turbine modelled in Simpack

    Science.gov (United States)

    Jassmann, U.; Berroth, J.; Matzke, D.; Schelenz, R.; Reiter, M.; Jacobs, G.; Abel, D.

    2014-06-01

    Wind turbines (WT) are steadily growing in size to increase their power production, which also causes increasing loads acting on the turbine's components. At the same time large structures, such as the blades and the tower get more flexible. To minimize this impact, the classical control loops for keeping the power production in an optimum state are more and more extended by load alleviation strategies. These additional control loops can be unified by a multiple-input multiple-output (MIMO) controller to achieve better balancing of tuning parameters. An example for MIMO control, which has been paid more attention to recently by wind industry, is Model Predictive Control (MPC). In a MPC framework a simplified model of the WT is used to predict its controlled outputs. Based on a user-defined cost function an online optimization calculates the optimal control sequence. Thereby MPC can intrinsically incorporate constraints e.g. of actuators. Turbine models used for calculation within the MPC are typically simplified. For testing and verification usually multi body simulations, such as FAST, BLADED or FLEX5 are used to model system dynamics, but they are still limited in the number of degrees of freedom (DOF). Detailed information about load distribution (e.g. inside the gearbox) cannot be provided by such models. In this paper a Model Predictive Controller is presented and tested in a co-simulation with SlMPACK, a multi body system (MBS) simulation framework used for detailed load analysis. The analysis are performed on the basis of the IME6.0 MBS WT model, described in this paper. It is based on the rotor of the NREL 5MW WT and consists of a detailed representation of the drive train. This takes into account a flexible main shaft and its main bearings with a planetary gearbox, where all components are modelled flexible, as well as a supporting flexible main frame. The wind loads are simulated using the NREL AERODYN v13 code which has been implemented as a routine to

  14. Poisson Mixture Regression Models for Heart Disease Prediction.

    Science.gov (United States)

    Mufudza, Chipo; Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

  15. Predicting Genes Involved in Human Cancer Using Network Contextual Information

    Directory of Open Access Journals (Sweden)

    Rahmani Hossein

    2012-03-01

    Full Text Available Protein-Protein Interaction (PPI networks have been widely used for the task of predicting proteins involved in cancer. Previous research has shown that functional information about the protein for which a prediction is made, proximity to specific other proteins in the PPI network, as well as local network structure are informative features in this respect. In this work, we introduce two new types of input features, reflecting additional information: (1 Functional Context: the functions of proteins interacting with the target protein (rather than the protein itself; and (2 Structural Context: the relative position of the target protein with respect to specific other proteins selected according to a novel ANOVA (analysis of variance based measure. We also introduce a selection strategy to pinpoint the most informative features. Results show that the proposed feature types and feature selection strategy yield informative features. A standard machine learning method (Naive Bayes that uses the features proposed here outperforms the current state-of-the-art methods by more than 5% with respect to F-measure. In addition, manual inspection confirms the biological relevance of the top-ranked features.

  16. The Mouse Solitary Odorant Receptor Gene Promoters as Models for the Study of Odorant Receptor Gene Choice.

    Directory of Open Access Journals (Sweden)

    Andrea Degl'Innocenti

    Full Text Available In vertebrates, several anatomical regions located within the nasal cavity mediate olfaction. Among these, the main olfactory epithelium detects most conventional odorants. Olfactory sensory neurons, provided with cilia exposed to the air, detect volatile chemicals via an extremely large family of seven-transmembrane chemoreceptors named odorant receptors. Their genes are expressed in a monogenic and monoallelic fashion: a single allele of a single odorant receptor gene is transcribed in a given mature neuron, through a still uncharacterized molecular mechanism known as odorant receptor gene choice.Odorant receptor genes are typically arranged in genomic clusters, but a few are isolated (we call them solitary from the others within a region broader than 1 Mb upstream and downstream with respect to their transcript's coordinates. The study of clustered genes is problematic, because of redundancy and ambiguities in their regulatory elements: we propose to use the solitary genes as simplified models to understand odorant receptor gene choice.Here we define number and identity of the solitary genes in the mouse genome (C57BL/6J, and assess the conservation of the solitary status in some mammalian orthologs. Furthermore, we locate their putative promoters, predict their homeodomain binding sites (commonly present in the promoters of odorant receptor genes and compare candidate promoter sequences with those of wild-caught mice. We also provide expression data from histological sections.In the mouse genome there are eight intact solitary genes: Olfr19 (M12, Olfr49, Olfr266, Olfr267, Olfr370, Olfr371, Olfr466, Olfr1402; five are conserved as solitary in rat. These genes are all expressed in the main olfactory epithelium of three-day-old mice. The C57BL/6J candidate promoter of Olfr370 has considerably varied compared to its wild-type counterpart. Within the putative promoter for Olfr266 a homeodomain binding site is predicted. As a whole, our findings

  17. Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models

    Directory of Open Access Journals (Sweden)

    Surovcik Katharina

    2006-03-01

    Full Text Available Abstract Background Horizontal gene transfer (HGT is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs or more specifically pathogenicity or symbiotic islands. Results We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format. It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods. Conclusion SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired

  18. An approach for reduction of false predictions in reverse engineering of gene regulatory networks.

    Science.gov (United States)

    Khan, Abhinandan; Saha, Goutam; Pal, Rajat Kumar

    2018-05-14

    A gene regulatory network discloses the regulatory interactions amongst genes, at a particular condition of the human body. The accurate reconstruction of such networks from time-series genetic expression data using computational tools offers a stiff challenge for contemporary computer scientists. This is crucial to facilitate the understanding of the proper functioning of a living organism. Unfortunately, the computational methods produce many false predictions along with the correct predictions, which is unwanted. Investigations in the domain focus on the identification of as many correct regulations as possible in the reverse engineering of gene regulatory networks to make it more reliable and biologically relevant. One way to achieve this is to reduce the number of incorrect predictions in the reconstructed networks. In the present investigation, we have proposed a novel scheme to decrease the number of false predictions by suitably combining several metaheuristic techniques. We have implemented the same using a dataset ensemble approach (i.e. combining multiple datasets) also. We have employed the proposed methodology on real-world experimental datasets of the SOS DNA Repair network of Escherichia coli and the IMRA network of Saccharomyces cerevisiae. Subsequently, we have experimented upon somewhat larger, in silico networks, namely, DREAM3 and DREAM4 Challenge networks, and 15-gene and 20-gene networks extracted from the GeneNetWeaver database. To study the effect of multiple datasets on the quality of the inferred networks, we have used four datasets in each experiment. The obtained results are encouraging enough as the proposed methodology can reduce the number of false predictions significantly, without using any supplementary prior biological information for larger gene regulatory networks. It is also observed that if a small amount of prior biological information is incorporated here, the results improve further w.r.t. the prediction of true positives

  19. Adipose gene expression prior to weight loss can differentiate and weakly predict dietary responders.

    Directory of Open Access Journals (Sweden)

    David M Mutch

    Full Text Available BACKGROUND: The ability to identify obese individuals who will successfully lose weight in response to dietary intervention will revolutionize disease management. Therefore, we asked whether it is possible to identify subjects who will lose weight during dietary intervention using only a single gene expression snapshot. METHODOLOGY/PRINCIPAL FINDINGS: The present study involved 54 female subjects from the Nutrient-Gene Interactions in Human Obesity-Implications for Dietary Guidelines (NUGENOB trial to determine whether subcutaneous adipose tissue gene expression could be used to predict weight loss prior to the 10-week consumption of a low-fat hypocaloric diet. Using several statistical tests revealed that the gene expression profiles of responders (8-12 kgs weight loss could always be differentiated from non-responders (<4 kgs weight loss. We also assessed whether this differentiation was sufficient for prediction. Using a bottom-up (i.e. black-box approach, standard class prediction algorithms were able to predict dietary responders with up to 61.1%+/-8.1% accuracy. Using a top-down approach (i.e. using differentially expressed genes to build a classifier improved prediction accuracy to 80.9%+/-2.2%. CONCLUSION: Adipose gene expression profiling prior to the consumption of a low-fat diet is able to differentiate responders from non-responders as well as serve as a weak predictor of subjects destined to lose weight. While the degree of prediction accuracy currently achieved with a gene expression snapshot is perhaps insufficient for clinical use, this work reveals that the comprehensive molecular signature of adipose tissue paves the way for the future of personalized nutrition.

  20. Phylogenomic detection and functional prediction of genes potentially important for plant meiosis.

    Science.gov (United States)

    Zhang, Luoyan; Kong, Hongzhi; Ma, Hong; Yang, Ji

    2018-02-15

    Meiosis is a specialized type of cell division necessary for sexual reproduction in eukaryotes. A better understanding of the cytological procedures of meiosis has been achieved by comprehensive cytogenetic studies in plants, while the genetic mechanisms regulating meiotic progression remain incompletely understood. The increasing accumulation of complete genome sequences and large-scale gene expression datasets has provided a powerful resource for phylogenomic inference and unsupervised identification of genes involved in plant meiosis. By integrating sequence homology and expression data, 164, 131, 124 and 162 genes potentially important for meiosis were identified in the genomes of Arabidopsis thaliana, Oryza sativa, Selaginella moellendorffii and Pogonatum aloides, respectively. The predicted genes were assigned to 45 meiotic GO terms, and their functions were related to different processes occurring during meiosis in various organisms. Most of the predicted meiotic genes underwent lineage-specific duplication events during plant evolution, with about 30% of the predicted genes retaining only a single copy in higher plant genomes. The results of this study provided clues to design experiments for better functional characterization of meiotic genes in plants, promoting the phylogenomic approach to the evolutionary dynamics of the plant meiotic machineries. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. The integration of weighted human gene association networks based on link prediction.

    Science.gov (United States)

    Yang, Jian; Yang, Tinghong; Wu, Duzhi; Lin, Limei; Yang, Fan; Zhao, Jing

    2017-01-31

    Physical and functional interplays between genes or proteins have important biological meaning for cellular functions. Some efforts have been made to construct weighted gene association meta-networks by integrating multiple biological resources, where the weight indicates the confidence of the interaction. However, it is found that these existing human gene association networks share only quite limited overlapped interactions, suggesting their incompleteness and noise. Here we proposed a workflow to construct a weighted human gene association network using information of six existing networks, including two weighted specific PPI networks and four gene association meta-networks. We applied link prediction algorithm to predict possible missing links of the networks, cross-validation approach to refine each network and finally integrated the refined networks to get the final integrated network. The common information among the refined networks increases notably, suggesting their higher reliability. Our final integrated network owns much more links than most of the original networks, meanwhile its links still keep high functional relevance. Being used as background network in a case study of disease gene prediction, the final integrated network presents good performance, implying its reliability and application significance. Our workflow could be insightful for integrating and refining existing gene association data.

  2. Predictive integrated modelling for ITER scenarios

    International Nuclear Information System (INIS)

    Artaud, J.F.; Imbeaux, F.; Aniel, T.; Basiuk, V.; Eriksson, L.G.; Giruzzi, G.; Hoang, G.T.; Huysmans, G.; Joffrin, E.; Peysson, Y.; Schneider, M.; Thomas, P.

    2005-01-01

    The uncertainty on the prediction of ITER scenarios is evaluated. 2 transport models which have been extensively validated against the multi-machine database are used for the computation of the transport coefficients. The first model is GLF23, the second called Kiauto is a model in which the profile of dilution coefficient is a gyro Bohm-like analytical function, renormalized in order to get profiles consistent with a given global energy confinement scaling. The package of codes CRONOS is used, it gives access to the dynamics of the discharge and allows the study of interplay between heat transport, current diffusion and sources. The main motivation of this work is to study the influence of parameters such plasma current, heat, density, impurities and toroidal moment transport. We can draw the following conclusions: 1) the target Q = 10 can be obtained in ITER hybrid scenario at I p = 13 MA, using either the DS03 two terms scaling or the GLF23 model based on the same pedestal; 2) I p = 11.3 MA, Q = 10 can be reached only assuming a very peaked pressure profile and a low pedestal; 3) at fixed Greenwald fraction, Q increases with density peaking; 4) achieving a stationary q-profile with q > 1 requires a large non-inductive current fraction (80%) that could be provided by 20 to 40 MW of LHCD; and 5) owing to the high temperature the q-profile penetration is delayed and q = 1 is reached about 600 s in ITER hybrid scenario at I p = 13 MA, in the absence of active q-profile control. (A.C.)

  3. Proteome Profiling Outperforms Transcriptome Profiling for Coexpression Based Gene Function Prediction

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Jing; Ma, Zihao; Carr, Steven A.; Mertins, Philipp; Zhang, Hui; Zhang, Zhen; Chan, Daniel W.; Ellis, Matthew J. C.; Townsend, R. Reid; Smith, Richard D.; McDermott, Jason E.; Chen, Xian; Paulovich, Amanda G.; Boja, Emily S.; Mesri, Mehdi; Kinsinger, Christopher R.; Rodriguez, Henry; Rodland, Karin D.; Liebler, Daniel C.; Zhang, Bing

    2016-11-11

    Coexpression of mRNAs under multiple conditions is commonly used to infer cofunctionality of their gene products despite well-known limitations of this “guilt-by-association” (GBA) approach. Recent advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for coexpression based gene function prediction has not been systematically investigated. Here, we address this question by constructing and analyzing mRNA and protein coexpression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Our analyses revealed a marked difference in wiring between the mRNA and protein coexpression networks. Whereas protein coexpression was driven primarily by functional similarity between coexpressed genes, mRNA coexpression was driven by both cofunction and chromosomal colocalization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein coexpression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. Our results demonstrate that proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies

  4. New Markov Model Approaches to Deciphering Microbial Genome Function and Evolution: Comparative Genomics of Laterally Transferred Genes

    Energy Technology Data Exchange (ETDEWEB)

    Borodovsky, M.

    2013-04-11

    Algorithmic methods for gene prediction have been developed and successfully applied to many different prokaryotic genome sequences. As the set of genes in a particular genome is not homogeneous with respect to DNA sequence composition features, the GeneMark.hmm program utilizes two Markov models representing distinct classes of protein coding genes denoted "typical" and "atypical". Atypical genes are those whose DNA features deviate significantly from those classified as typical and they represent approximately 10% of any given genome. In addition to the inherent interest of more accurately predicting genes, the atypical status of these genes may also reflect their separate evolutionary ancestry from other genes in that genome. We hypothesize that atypical genes are largely comprised of those genes that have been relatively recently acquired through lateral gene transfer (LGT). If so, what fraction of atypical genes are such bona fide LGTs? We have made atypical gene predictions for all fully completed prokaryotic genomes; we have been able to compare these results to other "surrogate" methods of LGT prediction.

  5. Use of Artificial Intelligence and Machine Learning Algorithms with Gene Expression Profiling to Predict Recurrent Nonmuscle Invasive Urothelial Carcinoma of the Bladder.

    Science.gov (United States)

    Bartsch, Georg; Mitra, Anirban P; Mitra, Sheetal A; Almal, Arpit A; Steven, Kenneth E; Skinner, Donald G; Fry, David W; Lenehan, Peter F; Worzel, William P; Cote, Richard J

    2016-02-01

    Due to the high recurrence risk of nonmuscle invasive urothelial carcinoma it is crucial to distinguish patients at high risk from those with indolent disease. In this study we used a machine learning algorithm to identify the genes in patients with nonmuscle invasive urothelial carcinoma at initial presentation that were most predictive of recurrence. We used the genes in a molecular signature to predict recurrence risk within 5 years after transurethral resection of bladder tumor. Whole genome profiling was performed on 112 frozen nonmuscle invasive urothelial carcinoma specimens obtained at first presentation on Human WG-6 BeadChips (Illumina®). A genetic programming algorithm was applied to evolve classifier mathematical models for outcome prediction. Cross-validation based resampling and gene use frequencies were used to identify the most prognostic genes, which were combined into rules used in a voting algorithm to predict the sample target class. Key genes were validated by quantitative polymerase chain reaction. The classifier set included 21 genes that predicted recurrence. Quantitative polymerase chain reaction was done for these genes in a subset of 100 patients. A 5-gene combined rule incorporating a voting algorithm yielded 77% sensitivity and 85% specificity to predict recurrence in the training set, and 69% and 62%, respectively, in the test set. A singular 3-gene rule was constructed that predicted recurrence with 80% sensitivity and 90% specificity in the training set, and 71% and 67%, respectively, in the test set. Using primary nonmuscle invasive urothelial carcinoma from initial occurrences genetic programming identified transcripts in reproducible fashion, which were predictive of recurrence. These findings could potentially impact nonmuscle invasive urothelial carcinoma management. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  6. High-Throughput Gene Expression Profiles to Define Drug Similarity and Predict Compound Activity.

    Science.gov (United States)

    De Wolf, Hans; Cougnaud, Laure; Van Hoorde, Kirsten; De Bondt, An; Wegner, Joerg K; Ceulemans, Hugo; Göhlmann, Hinrich

    2018-04-01

    By adding biological information, beyond the chemical properties and desired effect of a compound, uncharted compound areas and connections can be explored. In this study, we add transcriptional information for 31K compounds of Janssen's primary screening deck, using the HT L1000 platform and assess (a) the transcriptional connection score for generating compound similarities, (b) machine learning algorithms for generating target activity predictions, and (c) the scaffold hopping potential of the resulting hits. We demonstrate that the transcriptional connection score is best computed from the significant genes only and should be interpreted within its confidence interval for which we provide the stats. These guidelines help to reduce noise, increase reproducibility, and enable the separation of specific and promiscuous compounds. The added value of machine learning is demonstrated for the NR3C1 and HSP90 targets. Support Vector Machine models yielded balanced accuracy values ≥80% when the expression values from DDIT4 & SERPINE1 and TMEM97 & SPR were used to predict the NR3C1 and HSP90 activity, respectively. Combining both models resulted in 22 new and confirmed HSP90-independent NR3C1 inhibitors, providing two scaffolds (i.e., pyrimidine and pyrazolo-pyrimidine), which could potentially be of interest in the treatment of depression (i.e., inhibiting the glucocorticoid receptor (i.e., NR3C1), while leaving its chaperone, HSP90, unaffected). As such, the initial hit rate increased by a factor 300, as less, but more specific chemistry could be screened, based on the upfront computed activity predictions.

  7. Gene expression variation to predict 10-year survival in lymph-node-negative breast cancer

    International Nuclear Information System (INIS)

    Karlsson, Elin; Delle, Ulla; Danielsson, Anna; Olsson, Björn; Abel, Frida; Karlsson, Per; Helou, Khalil

    2008-01-01

    It is of great significance to find better markers to correctly distinguish between high-risk and low-risk breast cancer patients since the majority of breast cancer cases are at present being overtreated. 46 tumours from node-negative breast cancer patients were studied with gene expression microarrays. A t-test was carried out in order to find a set of genes where the expression might predict clinical outcome. Two classifiers were used for evaluation of the gene lists, a correlation-based classifier and a Voting Features Interval (VFI) classifier. We then evaluated the predictive accuracy of this expression signature on tumour sets from two similar studies on lymph-node negative patients. They had both developed gene expression signatures superior to current methods in classifying node-negative breast tumours. These two signatures were also tested on our material. A list of 51 genes whose expression profiles could predict clinical outcome with high accuracy in our material (96% or 89% accuracy in cross-validation, depending on type of classifier) was developed. When tested on two independent data sets, the expression signature based on the 51 identified genes had good predictive qualities in one of the data sets (74% accuracy), whereas their predictive value on the other data set were poor, presumably due to the fact that only 23 of the 51 genes were found in that material. We also found that previously developed expression signatures could predict clinical outcome well to moderately well in our material (72% and 61%, respectively). The list of 51 genes derived in this study might have potential for clinical utility as a prognostic gene set, and may include candidate genes of potential relevance for clinical outcome in breast cancer. According to the predictions by this expression signature, 30 of the 46 patients may have benefited from different adjuvant treatment than they recieved. The research on these tumours was approved by the Medical Faculty Research

  8. Common and rare variants in the exons and regulatory regions of osteoporosis-related genes improve osteoporotic fracture risk prediction.

    Science.gov (United States)

    Lee, Seung Hun; Kang, Moo Il; Ahn, Seong Hee; Lim, Kyeong-Hye; Lee, Gun Eui; Shin, Eun-Soon; Lee, Jong-Eun; Kim, Beom-Jun; Cho, Eun-Hee; Kim, Sang-Wook; Kim, Tae-Ho; Kim, Hyun-Ju; Yoon, Kun-Ho; Lee, Won Chul; Kim, Ghi Su; Koh, Jung-Min; Kim, Shin-Yoon

    2014-11-01

    Osteoporotic fracture risk is highly heritable, but genome-wide association studies have explained only a small proportion of the heritability to date. Genetic data may improve prediction of fracture risk in osteopenic subjects and assist early intervention and management. To detect common and rare variants in coding and regulatory regions related to osteoporosis-related traits, and to investigate whether genetic profiling improves the prediction of fracture risk. This cross-sectional study was conducted in three clinical units in Korea. Postmenopausal women with extreme phenotypes (n = 982) were used for the discovery set, and 3895 participants were used for the replication set. We performed targeted resequencing of 198 genes. Genetic risk scores from common variants (GRS-C) and from common and rare variants (GRS-T) were calculated. Nineteen common variants in 17 genes (of the discovered 34 functional variants in 26 genes) and 31 rare variants in five genes (of the discovered 87 functional variants in 15 genes) were associated with one or more osteoporosis-related traits. Accuracy of fracture risk classification was improved in the osteopenic patients by adding GRS-C to fracture risk assessment models (6.8%; P risk in an osteopenic individual.

  9. lncRNA Gene Signatures for Prediction of Breast Cancer Intrinsic Subtypes and Prognosis

    Directory of Open Access Journals (Sweden)

    Silu Zhang

    2018-01-01

    Full Text Available Background: Breast cancer is intrinsically heterogeneous and is commonly classified into four main subtypes associated with distinct biological features and clinical outcomes. However, currently available data resources and methods are limited in identifying molecular subtyping on protein-coding genes, and little is known about the roles of long non-coding RNAs (lncRNAs, which occupies 98% of the whole genome. lncRNAs may also play important roles in subgrouping cancer patients and are associated with clinical phenotypes. Methods: The purpose of this project was to identify lncRNA gene signatures that are associated with breast cancer subtypes and clinical outcomes. We identified lncRNA gene signatures from The Cancer Genome Atlas (TCGA RNAseq data that are associated with breast cancer subtypes by an optimized 1-Norm SVM feature selection algorithm. We evaluated the prognostic performance of these gene signatures with a semi-supervised principal component (superPC method. Results: Although lncRNAs can independently predict breast cancer subtypes with satisfactory accuracy, a combined gene signature including both coding and non-coding genes will give the best clinically relevant prediction performance. We highlighted eight potential biomarkers (three from coding genes and five from non-coding genes that are significantly associated with survival outcomes. Conclusion: Our proposed methods are a novel means of identifying subtype-specific coding and non-coding potential biomarkers that are both clinically relevant and biologically significant.

  10. Predictive networks: a flexible, open source, web application for integration and analysis of human gene networks.

    Science.gov (United States)

    Haibe-Kains, Benjamin; Olsen, Catharina; Djebbari, Amira; Bontempi, Gianluca; Correll, Mick; Bouton, Christopher; Quackenbush, John

    2012-01-01

    Genomics provided us with an unprecedented quantity of data on the genes that are activated or repressed in a wide range of phenotypes. We have increasingly come to recognize that defining the networks and pathways underlying these phenotypes requires both the integration of multiple data types and the development of advanced computational methods to infer relationships between the genes and to estimate the predictive power of the networks through which they interact. To address these issues we have developed Predictive Networks (PN), a flexible, open-source, web-based application and data services framework that enables the integration, navigation, visualization and analysis of gene interaction networks. The primary goal of PN is to allow biomedical researchers to evaluate experimentally derived gene lists in the context of large-scale gene interaction networks. The PN analytical pipeline involves two key steps. The first is the collection of a comprehensive set of known gene interactions derived from a variety of publicly available sources. The second is to use these 'known' interactions together with gene expression data to infer robust gene networks. The PN web application is accessible from http://predictivenetworks.org. The PN code base is freely available at https://sourceforge.net/projects/predictivenets/.

  11. Building prognostic models for breast cancer patients using clinical variables and hundreds of gene expression signatures

    Directory of Open Access Journals (Sweden)

    Liu Yufeng

    2011-01-01

    Full Text Available Abstract Background Multiple breast cancer gene expression profiles have been developed that appear to provide similar abilities to predict outcome and may outperform clinical-pathologic criteria; however, the extent to which seemingly disparate profiles provide additive prognostic information is not known, nor do we know whether prognostic profiles perform equally across clinically defined breast cancer subtypes. We evaluated whether combining the prognostic powers of standard breast cancer clinical variables with a large set of gene expression signatures could improve on our ability to predict patient outcomes. Methods Using clinical-pathological variables and a collection of 323 gene expression "modules", including 115 previously published signatures, we build multivariate Cox proportional hazards models using a dataset of 550 node-negative systemically untreated breast cancer patients. Models predictive of pathological complete response (pCR to neoadjuvant chemotherapy were also built using this approach. Results We identified statistically significant prognostic models for relapse-free survival (RFS at 7 years for the entire population, and for the subgroups of patients with ER-positive, or Luminal tumors. Furthermore, we found that combined models that included both clinical and genomic parameters improved prognostication compared with models with either clinical or genomic variables alone. Finally, we were able to build statistically significant combined models for pathological complete response (pCR predictions for the entire population. Conclusions Integration of gene expression signatures and clinical-pathological factors is an improved method over either variable type alone. Highly prognostic models could be created when using all patients, and for the subset of patients with lymph node-negative and ER-positive breast cancers. Other variables beyond gene expression and clinical-pathological variables, like gene mutation status or DNA

  12. Gene prediction in the fathead minnow [Pimephales promelas] genome

    Science.gov (United States)

    The fathead minnow is a well-established model organism which has been widely used for regulatory ecotoxicity testing and research for over half century. While much information has been gathered on the organism over the years, the fathead minnow genome, a critical source of infor...

  13. Automated Protocol for Large-Scale Modeling of Gene Expression Data.

    Science.gov (United States)

    Hall, Michelle Lynn; Calkins, David; Sherman, Woody

    2016-11-28

    With the continued rise of phenotypic- and genotypic-based screening projects, computational methods to analyze, process, and ultimately make predictions in this field take on growing importance. Here we show how automated machine learning workflows can produce models that are predictive of differential gene expression as a function of a compound structure using data from A673 cells as a proof of principle. In particular, we present predictive models with an average accuracy of greater than 70% across a highly diverse ∼1000 gene expression profile. In contrast to the usual in silico design paradigm, where one interrogates a particular target-based response, this work opens the opportunity for virtual screening and lead optimization for desired multitarget gene expression profiles.

  14. Interactions of adolescent social experiences and dopamine genes to predict physical intimate partner violence perpetration.

    Directory of Open Access Journals (Sweden)

    Laura M Schwab-Reese

    Full Text Available We examined the interactions between three dopamine gene alleles (DAT1, DRD2, DRD4 previously associated with violent behavior and two components of the adolescent environment (exposure to violence, school social environment to predict adulthood physical intimate partner violence (IPV perpetration among white men and women.We used data from Wave IV of the National Longitudinal Study of Adolescent to Adult Health, a cohort study following individuals from adolescence to adulthood. Based on the prior literature, we categorized participants as at risk for each of the three dopamine genes using this coding scheme: two 10-R alleles for DAT1; at least one A-1 allele for DRD2; at least one 7-R or 8-R allele for DRD4. Adolescent exposure to violence and school social environment was measured in 1994 and 1995 when participants were in high school or middle school. Intimate partner violence perpetration was measured in 2008 when participants were 24 to 32 years old. We used simple and multivariable logistic regression models, including interactions of genes and the adolescent environments for the analysis.Presence of risk alleles was not independently associated with IPV perpetration but increasing exposure to violence and disconnection from the school social environment was associated with physical IPV perpetration. The effects of these adolescent experiences on physical IPV perpetration varied by dopamine risk allele status. Among individuals with non-risk dopamine alleles, increased exposure to violence during adolescence and perception of disconnection from the school environment were significantly associated with increased odds of physical IPV perpetration, but individuals with high risk alleles, overall, did not experience the same increase.Our results suggested the effects of adolescent environment on adulthood physical IPV perpetration varied by genetic factors. This analysis did not find a direct link between risk alleles and violence, but

  15. Integrating geophysics and hydrology for reducing the uncertainty of groundwater model predictions and improved prediction performance

    DEFF Research Database (Denmark)

    Christensen, Nikolaj Kruse; Christensen, Steen; Ferre, Ty

    the integration of geophysical data in the construction of a groundwater model increases the prediction performance. We suggest that modelers should perform a hydrogeophysical “test-bench” analysis of the likely value of geophysics data for improving groundwater model prediction performance before actually...... and the resulting predictions can be compared with predictions from the ‘true’ model. By performing this analysis we expect to give the modeler insight into how the uncertainty of model-based prediction can be reduced.......A major purpose of groundwater modeling is to help decision-makers in efforts to manage the natural environment. Increasingly, it is recognized that both the predictions of interest and their associated uncertainties should be quantified to support robust decision making. In particular, decision...

  16. A BAYESIAN NONPARAMETRIC MIXTURE MODEL FOR SELECTING GENES AND GENE SUBNETWORKS.

    Science.gov (United States)

    Zhao, Yize; Kang, Jian; Yu, Tianwei

    2014-06-01

    It is very challenging to select informative features from tens of thousands of measured features in high-throughput data analysis. Recently, several parametric/regression models have been developed utilizing the gene network information to select genes or pathways strongly associated with a clinical/biological outcome. Alternatively, in this paper, we propose a nonparametric Bayesian model for gene selection incorporating network information. In addition to identifying genes that have a strong association with a clinical outcome, our model can select genes with particular expressional behavior, in which case the regression models are not directly applicable. We show that our proposed model is equivalent to an infinity mixture model for which we develop a posterior computation algorithm based on Markov chain Monte Carlo (MCMC) methods. We also propose two fast computing algorithms that approximate the posterior simulation with good accuracy but relatively low computational cost. We illustrate our methods on simulation studies and the analysis of Spellman yeast cell cycle microarray data.

  17. A gene signature in histologically normal surgical margins is predictive of oral carcinoma recurrence

    International Nuclear Information System (INIS)

    Reis, Patricia P; Simpson, Colleen; Goldstein, David; Brown, Dale; Gilbert, Ralph; Gullane, Patrick; Irish, Jonathan; Jurisica, Igor; Kamel-Reid, Suzanne; Waldron, Levi; Perez-Ordonez, Bayardo; Pintilie, Melania; Galloni, Natalie Naranjo; Xuan, Yali; Cervigne, Nilva K; Warner, Giles C; Makitie, Antti A

    2011-01-01

    Oral Squamous Cell Carcinoma (OSCC) is a major cause of cancer death worldwide, which is mainly due to recurrence leading to treatment failure and patient death. Histological status of surgical margins is a currently available assessment for recurrence risk in OSCC; however histological status does not predict recurrence, even in patients with histologically negative margins. Therefore, molecular analysis of histologically normal resection margins and the corresponding OSCC may aid in identifying a gene signature predictive of recurrence. We used a meta-analysis of 199 samples (OSCCs and normal oral tissues) from five public microarray datasets, in addition to our microarray analysis of 96 OSCCs and histologically normal margins from 24 patients, to train a gene signature for recurrence. Validation was performed by quantitative real-time PCR using 136 samples from an independent cohort of 30 patients. We identified 138 significantly over-expressed genes (> 2-fold, false discovery rate of 0.01) in OSCC. By penalized likelihood Cox regression, we identified a 4-gene signature with prognostic value for recurrence in our training set. This signature comprised the invasion-related genes MMP1, COL4A1, P4HA2, and THBS2. Over-expression of this 4-gene signature in histologically normal margins was associated with recurrence in our training cohort (p = 0.0003, logrank test) and in our independent validation cohort (p = 0.04, HR = 6.8, logrank test). Gene expression alterations occur in histologically normal margins in OSCC. Over-expression of the 4-gene signature in histologically normal surgical margins was validated and highly predictive of recurrence in an independent patient cohort. Our findings may be applied to develop a molecular test, which would be clinically useful to help predict which patients are at a higher risk of local recurrence

  18. MITEs in the promoters of effector genes allow prediction of novel virulence genes in Fusarium oxysporum

    NARCIS (Netherlands)

    Schmidt, S.M.; Houterman, P.M.; Schreiver, I.; Ma, L.; Amyotte, S.; Chellappan, B.; Boeren, S.; Takken, F.L.W.; Rep, M.

    2013-01-01

    Background The plant-pathogenic fungus Fusarium oxysporum f.sp.lycopersici (Fol) has accessory, lineage-specific (LS) chromosomes that can be transferred horizontally between strains. A single LS chromosome in the Fol4287 reference strain harbors all known Fol effector genes. Transfer of this

  19. Gene prediction in metagenomic fragments: A large scale machine learning approach

    Directory of Open Access Journals (Sweden)

    Morgenstern Burkhard

    2008-04-01

    Full Text Available Abstract Background Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. Results We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Conclusion Large scale machine learning methods are well-suited for gene

  20. The cure: design and evaluation of a crowdsourcing game for gene selection for breast cancer survival prediction.

    Science.gov (United States)

    Good, Benjamin M; Loguercio, Salvatore; Griffith, Obi L; Nanis, Max; Wu, Chunlei; Su, Andrew I

    2014-07-29

    Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility, and biological interpretability. Methods that take advantage of structured prior knowledge (eg, protein interaction networks) show promise in helping to define better signatures, but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes unheard of before. The main objective of this study was to test the hypothesis that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from players of an open, Web-based game. We envisioned capturing knowledge both from the player's prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game. We developed and evaluated an online game called The Cure that captured information from players regarding genes for use as predictors of breast cancer survival. Information gathered from game play was aggregated using a voting approach, and used to create rankings of genes. The top genes from these rankings were evaluated using annotation enrichment analysis, comparison to prior predictor gene sets, and by using them to train and test machine learning systems for predicting 10 year survival. Between its launch in September 2012 and September 2013, The Cure attracted more than 1000 registered players, who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data showed significant enrichment for genes known to be related to key concepts such as cancer, disease progression, and recurrence. In terms of the predictive accuracy of models trained using this

  1. Analysis of functional importance of binding sites in the Drosophila gap gene network model.

    Science.gov (United States)

    Kozlov, Konstantin; Gursky, Vitaly V; Kulakovskiy, Ivan V; Dymova, Arina; Samsonova, Maria

    2015-01-01

    The statistical thermodynamics based approach provides a promising framework for construction of the genotype-phenotype map in many biological systems. Among important aspects of a good model connecting the DNA sequence information with that of a molecular phenotype (gene expression) is the selection of regulatory interactions and relevant transcription factor bindings sites. As the model may predict different levels of the functional importance of specific binding sites in different genomic and regulatory contexts, it is essential to formulate and study such models under different modeling assumptions. We elaborate a two-layer model for the Drosophila gap gene network and include in the model a combined set of transcription factor binding sites and concentration dependent regulatory interaction between gap genes hunchback and Kruppel. We show that the new variants of the model are more consistent in terms of gene expression predictions for various genetic constructs in comparison to previous work. We quantify the functional importance of binding sites by calculating their impact on gene expression in the model and calculate how these impacts correlate across all sites under different modeling assumptions. The assumption about the dual interaction between hb and Kr leads to the most consistent modeling results, but, on the other hand, may obscure existence of indirect interactions between binding sites in regulatory regions of distinct genes. The analysis confirms the previously formulated regulation concept of many weak binding sites working in concert. The model predicts a more or less uniform distribution of functionally important binding sites over the sets of experimentally characterized regulatory modules and other open chromatin domains.

  2. Neural Fuzzy Inference System-Based Weather Prediction Model and Its Precipitation Predicting Experiment

    Directory of Open Access Journals (Sweden)

    Jing Lu

    2014-11-01

    Full Text Available We propose a weather prediction model in this article based on neural network and fuzzy inference system (NFIS-WPM, and then apply it to predict daily fuzzy precipitation given meteorological premises for testing. The model consists of two parts: the first part is the “fuzzy rule-based neural network”, which simulates sequential relations among fuzzy sets using artificial neural network; and the second part is the “neural fuzzy inference system”, which is based on the first part, but could learn new fuzzy rules from the previous ones according to the algorithm we proposed. NFIS-WPM (High Pro and NFIS-WPM (Ave are improved versions of this model. It is well known that the need for accurate weather prediction is apparent when considering the benefits. However, the excessive pursuit of accuracy in weather prediction makes some of the “accurate” prediction results meaningless and the numerical prediction model is often complex and time-consuming. By adapting this novel model to a precipitation prediction problem, we make the predicted outcomes of precipitation more accurate and the prediction methods simpler than by using the complex numerical forecasting model that would occupy large computation resources, be time-consuming and which has a low predictive accuracy rate. Accordingly, we achieve more accurate predictive precipitation results than by using traditional artificial neural networks that have low predictive accuracy.

  3. Computational prediction and experimental validation of Ciona intestinalis microRNA genes

    Directory of Open Access Journals (Sweden)

    Pasquinelli Amy E

    2007-11-01

    Full Text Available Abstract Background This study reports the first collection of validated microRNA genes in the sea squirt, Ciona intestinalis. MicroRNAs are processed from hairpin precursors to ~22 nucleotide RNAs that base pair to target mRNAs and inhibit expression. As a member of the subphylum Urochordata (Tunicata whose larval form has a notochord, the sea squirt is situated at the emergence of vertebrates, and therefore may provide information about the evolution of molecular regulators of early development. Results In this study, computational methods were used to predict 14 microRNA gene families in Ciona intestinalis. The microRNA prediction algorithm utilizes configurable microRNA sequence conservation and stem-loop specificity parameters, grouping by miRNA family, and phylogenetic conservation to the related species, Ciona savignyi. The expression for 8, out of 9 attempted, of the putative microRNAs in the adult tissue of Ciona intestinalis was validated by Northern blot analyses. Additionally, a target prediction algorithm was implemented, which identified a high confidence list of 240 potential target genes. Over half of the predicted targets can be grouped into the gene ontology categories of metabolism, transport, regulation of transcription, and cell signaling. Conclusion The computational techniques implemented in this study can be applied to other organisms and serve to increase the understanding of the origins of non-coding RNAs, embryological and cellular developmental pathways, and the mechanisms for microRNA-controlled gene regulatory networks.

  4. An ensemble method to predict target genes and pathways in uveal melanoma

    Directory of Open Access Journals (Sweden)

    Wei Chao

    2018-04-01

    Full Text Available This work proposes to predict target genes and pathways for uveal melanoma (UM based on an ensemble method and pathway analyses. Methods: The ensemble method integrated a correlation method (Pearson correlation coefficient, PCC, a causal inference method (IDA and a regression method (Lasso utilizing the Borda count election method. Subsequently, to validate the performance of PIL method, comparisons between confirmed database and predicted miRNA targets were performed. Ultimately, pathway enrichment analysis was conducted on target genes in top 1000 miRNA-mRNA interactions to identify target pathways for UM patients. Results: Thirty eight of the predicted interactions were matched with the confirmed interactions, indicating that the ensemble method was a suitable and feasible approach to predict miRNA targets. We obtained 50 seed miRNA-mRNA interactions of UM patients and extracted target genes from these interactions, such as ASPG, BSDC1 and C4BP. The 601 target genes in top 1,000 miRNA-mRNA interactions were enriched in 12 target pathways, of which Phototransduction was the most significant one. Conclusion: The target genes and pathways might provide a new way to reveal the molecular mechanism of UM and give hand for target treatments and preventions of this malignant tumor.

  5. CAsubtype: An R Package to Identify Gene Sets Predictive of Cancer Subtypes and Clinical Outcomes.

    Science.gov (United States)

    Kong, Hualei; Tong, Pan; Zhao, Xiaodong; Sun, Jielin; Li, Hua

    2018-03-01

    In the past decade, molecular classification of cancer has gained high popularity owing to its high predictive power on clinical outcomes as compared with traditional methods commonly used in clinical practice. In particular, using gene expression profiles, recent studies have successfully identified a number of gene sets for the delineation of cancer subtypes that are associated with distinct prognosis. However, identification of such gene sets remains a laborious task due to the lack of tools with flexibility, integration and ease of use. To reduce the burden, we have developed an R package, CAsubtype, to efficiently identify gene sets predictive of cancer subtypes and clinical outcomes. By integrating more than 13,000 annotated gene sets, CAsubtype provides a comprehensive repertoire of candidates for new cancer subtype identification. For easy data access, CAsubtype further includes the gene expression and clinical data of more than 2000 cancer patients from TCGA. CAsubtype first employs principal component analysis to identify gene sets (from user-provided or package-integrated ones) with robust principal components representing significantly large variation between cancer samples. Based on these principal components, CAsubtype visualizes the sample distribution in low-dimensional space for better understanding of the distinction between samples and classifies samples into subgroups with prevalent clustering algorithms. Finally, CAsubtype performs survival analysis to compare the clinical outcomes between the identified subgroups, assessing their clinical value as potentially novel cancer subtypes. In conclusion, CAsubtype is a flexible and well-integrated tool in the R environment to identify gene sets for cancer subtype identification and clinical outcome prediction. Its simple R commands and comprehensive data sets enable efficient examination of the clinical value of any given gene set, thus facilitating hypothesis generating and testing in biological and

  6. Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets

    Directory of Open Access Journals (Sweden)

    Karacali Bilge

    2007-10-01

    Full Text Available Abstract Background Independently derived expression profiles of the same biological condition often have few genes in common. In this study, we created populations of expression profiles from publicly available microarray datasets of cancer (breast, lymphoma and renal samples linked to clinical information with an iterative machine learning algorithm. ROC curves were used to assess the prediction error of each profile for classification. We compared the prediction error of profiles correlated with molecular phenotype against profiles correlated with relapse-free status. Prediction error of profiles identified with supervised univariate feature selection algorithms were compared to profiles selected randomly from a all genes on the microarray platform and b a list of known disease-related genes (a priori selection. We also determined the relevance of expression profiles on test arrays from independent datasets, measured on either the same or different microarray platforms. Results Highly discriminative expression profiles were produced on both simulated gene expression data and expression data from breast cancer and lymphoma datasets on the basis of ER and BCL-6 expression, respectively. Use of relapse-free status to identify profiles for prognosis prediction resulted in poorly discriminative decision rules. Supervised feature selection resulted in more accurate classifications than random or a priori selection, however, the difference in prediction error decreased as the number of features increased. These results held when decision rules were applied across-datasets to samples profiled on the same microarray platform. Conclusion Our results show that many gene sets predict molecular phenotypes accurately. Given this, expression profiles identified using different training datasets should be expected to show little agreement. In addition, we demonstrate the difficulty in predicting relapse directly from microarray data using supervised machine

  7. Foundation Settlement Prediction Based on a Novel NGM Model

    Directory of Open Access Journals (Sweden)

    Peng-Yu Chen

    2014-01-01

    Full Text Available Prediction of foundation or subgrade settlement is very important during engineering construction. According to the fact that there are lots of settlement-time sequences with a nonhomogeneous index trend, a novel grey forecasting model called NGM (1,1,k,c model is proposed in this paper. With an optimized whitenization differential equation, the proposed NGM (1,1,k,c model has the property of white exponential law coincidence and can predict a pure nonhomogeneous index sequence precisely. We used two case studies to verify the predictive effect of NGM (1,1,k,c model for settlement prediction. The results show that this model can achieve excellent prediction accuracy; thus, the model is quite suitable for simulation and prediction of approximate nonhomogeneous index sequence and has excellent application value in settlement prediction.

  8. Nonconvex model predictive control for commercial refrigeration

    Science.gov (United States)

    Gybel Hovgaard, Tobias; Boyd, Stephen; Larsen, Lars F. S.; Bagterp Jørgensen, John

    2013-08-01

    We consider the control of a commercial multi-zone refrigeration system, consisting of several cooling units that share a common compressor, and is used to cool multiple areas or rooms. In each time period we choose cooling capacity to each unit and a common evaporation temperature. The goal is to minimise the total energy cost, using real-time electricity prices, while obeying temperature constraints on the zones. We propose a variation on model predictive control to achieve this goal. When the right variables are used, the dynamics of the system are linear, and the constraints are convex. The cost function, however, is nonconvex due to the temperature dependence of thermodynamic efficiency. To handle this nonconvexity we propose a sequential convex optimisation method, which typically converges in fewer than 5 or so iterations. We employ a fast convex quadratic programming solver to carry out the iterations, which is more than fast enough to run in real time. We demonstrate our method on a realistic model, with a full year simulation and 15-minute time periods, using historical electricity prices and weather data, as well as random variations in thermal load. These simulations show substantial cost savings, on the order of 30%, compared to a standard thermostat-based control system. Perhaps more important, we see that the method exhibits sophisticated response to real-time variations in electricity prices. This demand response is critical to help balance real-time uncertainties in generation capacity associated with large penetration of intermittent renewable energy sources in a future smart grid.

  9. Predictive Modelling of Heavy Metals in Urban Lakes

    OpenAIRE

    Lindström, Martin

    2000-01-01

    Heavy metals are well-known environmental pollutants. In this thesis predictive models for heavy metals in urban lakes are discussed and new models presented. The base of predictive modelling is empirical data from field investigations of many ecosystems covering a wide range of ecosystem characteristics. Predictive models focus on the variabilities among lakes and processes controlling the major metal fluxes. Sediment and water data for this study were collected from ten small lakes in the ...

  10. Tumour gene expression predicts response to cetuximab in patients with KRAS wild-type metastatic colorectal cancer.

    Science.gov (United States)

    Baker, J B; Dutta, D; Watson, D; Maddala, T; Munneke, B M; Shak, S; Rowinsky, E K; Xu, L-A; Harbison, C T; Clark, E A; Mauro, D J; Khambata-Ford, S

    2011-02-01

    Although it is accepted that metastatic colorectal cancers (mCRCs) that carry activating mutations in KRAS are unresponsive to anti-epidermal growth factor receptor (EGFR) monoclonal antibodies, a significant fraction of KRAS wild-type (wt) mCRCs are also unresponsive to anti-EGFR therapy. Genes encoding EGFR ligands amphiregulin (AREG) and epiregulin (EREG) are promising gene expression-based markers but have not been incorporated into a test to dichotomise KRAS wt mCRC patients with respect to sensitivity to anti-EGFR treatment. We used RT-PCR to test 110 candidate gene expression markers in primary tumours from 144 KRAS wt mCRC patients who received monotherapy with the anti-EGFR antibody cetuximab. Results were correlated with multiple clinical endpoints: disease control, objective response, and progression-free survival (PFS). Expression of many of the tested candidate genes, including EREG and AREG, strongly associate with all clinical endpoints. Using multivariate analysis with two-layer five-fold cross-validation, we constructed a four-gene predictive classifier. Strikingly, patients below the classifier cutpoint had PFS and disease control rates similar to those of patients with KRAS mutant mCRC. Gene expression appears to identify KRAS wt mCRC patients who receive little benefit from cetuximab. It will be important to test this model in an independent validation study.

  11. Minimal gene selection for classification and diagnosis prediction based on gene expression profile

    Directory of Open Access Journals (Sweden)

    Alireza Mehridehnavi

    2013-01-01

    Conclusion: We have shown that the use of two most significant genes based on their S/N ratios and selection of suitable training samples can lead to classify DLBCL patients with a rather good result. Actually with the aid of mentioned methods we could compensate lack of enough number of patients, improve accuracy of classifying and reduce complication of computations and so running time.

  12. Seasonal predictability of Kiremt rainfall in coupled general circulation models

    Science.gov (United States)

    Gleixner, Stephanie; Keenlyside, Noel S.; Demissie, Teferi D.; Counillon, François; Wang, Yiguo; Viste, Ellen

    2017-11-01

    The Ethiopian economy and population is strongly dependent on rainfall. Operational seasonal predictions for the main rainy season (Kiremt, June-September) are based on statistical approaches with Pacific sea surface temperatures (SST) as the main predictor. Here we analyse dynamical predictions from 11 coupled general circulation models for the Kiremt seasons from 1985-2005 with the forecasts starting from the beginning of May. We find skillful predictions from three of the 11 models, but no model beats a simple linear prediction model based on the predicted Niño3.4 indices. The skill of the individual models for dynamically predicting Kiremt rainfall depends on the strength of the teleconnection between Kiremt rainfall and concurrent Pacific SST in the models. Models that do not simulate this teleconnection fail to capture the observed relationship between Kiremt rainfall and the large-scale Walker circulation.

  13. MODELLING OF DYNAMIC SPEED LIMITS USING THE MODEL PREDICTIVE CONTROL

    Directory of Open Access Journals (Sweden)

    Andrey Borisovich Nikolaev

    2017-09-01

    Full Text Available The article considers the issues of traffic management using intelligent system “Car-Road” (IVHS, which consist of interacting intelligent vehicles (IV and intelligent roadside controllers. Vehicles are organized in convoy with small distances between them. All vehicles are assumed to be fully automated (throttle control, braking, steering. Proposed approaches for determining speed limits for traffic cars on the motorway using a model predictive control (MPC. The article proposes an approach to dynamic speed limit to minimize the downtime of vehicles in traffic.

  14. MJO prediction skill of the subseasonal-to-seasonal (S2S) prediction models

    Science.gov (United States)

    Son, S. W.; Lim, Y.; Kim, D.

    2017-12-01

    The Madden-Julian Oscillation (MJO), the dominant mode of tropical intraseasonal variability, provides the primary source of tropical and extratropical predictability on subseasonal to seasonal timescales. To better understand its predictability, this study conducts quantitative evaluation of MJO prediction skill in the state-of-the-art operational models participating in the subseasonal-to-seasonal (S2S) prediction project. Based on bivariate correlation coefficient of 0.5, the S2S models exhibit MJO prediction skill ranging from 12 to 36 days. These prediction skills are affected by both the MJO amplitude and phase errors, the latter becoming more important with forecast lead times. Consistent with previous studies, the MJO events with stronger initial amplitude are typically better predicted. However, essentially no sensitivity to the initial MJO phase is observed. Overall MJO prediction skill and its inter-model spread are further related with the model mean biases in moisture fields and longwave cloud-radiation feedbacks. In most models, a dry bias quickly builds up in the deep tropics, especially across the Maritime Continent, weakening horizontal moisture gradient. This likely dampens the organization and propagation of MJO. Most S2S models also underestimate the longwave cloud-radiation feedbacks in the tropics, which may affect the maintenance of the MJO convective envelop. In general, the models with a smaller bias in horizontal moisture gradient and longwave cloud-radiation feedbacks show a higher MJO prediction skill, suggesting that improving those processes would enhance MJO prediction skill.

  15. Improved animal models for testing gene therapy for atherosclerosis.

    Science.gov (United States)

    Du, Liang; Zhang, Jingwan; De Meyer, Guido R Y; Flynn, Rowan; Dichek, David A

    2014-04-01

    Gene therapy delivered to the blood vessel wall could augment current therapies for atherosclerosis, including systemic drug therapy and stenting. However, identification of clinically useful vectors and effective therapeutic transgenes remains at the preclinical stage. Identification of effective vectors and transgenes would be accelerated by availability of animal models that allow practical and expeditious testing of vessel-wall-directed gene therapy. Such models would include humanlike lesions that develop rapidly in vessels that are amenable to efficient gene delivery. Moreover, because human atherosclerosis develops in normal vessels, gene therapy that prevents atherosclerosis is most logically tested in relatively normal arteries. Similarly, gene therapy that causes atherosclerosis regression requires gene delivery to an existing lesion. Here we report development of three new rabbit models for testing vessel-wall-directed gene therapy that either prevents or reverses atherosclerosis. Carotid artery intimal lesions in these new models develop within 2-7 months after initiation of a high-fat diet and are 20-80 times larger than lesions in a model we described previously. Individual models allow generation of lesions that are relatively rich in either macrophages or smooth muscle cells, permitting testing of gene therapy strategies targeted at either cell type. Two of the models include gene delivery to essentially normal arteries and will be useful for identifying strategies that prevent lesion development. The third model generates lesions rapidly in vector-naïve animals and can be used for testing gene therapy that promotes lesion regression. These models are optimized for testing helper-dependent adenovirus (HDAd)-mediated gene therapy; however, they could be easily adapted for testing of other vectors or of different types of molecular therapies, delivered directly to the blood vessel wall. Our data also supports the promise of HDAd to deliver long

  16. A 65‑gene signature for prognostic prediction in colon adenocarcinoma.

    Science.gov (United States)

    Jiang, Hui; Du, Jun; Gu, Jiming; Jin, Liugen; Pu, Yong; Fei, Bojian

    2018-04-01

    The aim of the present study was to examine the molecular factors associated with the prognosis of colon cancer. Gene expression datasets were downloaded from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus databases to screen differentially expressed genes (DEGs) between colon cancer samples and normal samples. Survival‑related genes were selected from the DEGs using the Cox regression method. A co‑expression network of survival‑related genes was then constructed, and functional clusters were extracted from this network. The significantly enriched functions and pathways of the genes in the network were identified. Using Bayesian discriminant analysis, a prognostic prediction system was established to distinguish the positive from negative prognostic samples. The discrimination efficacy of the system was validated in the GSE17538 dataset using Kaplan‑Meier survival analysis. A total of 636 and 1,892 DEGs between the colon cancer samples and normal samples were screened from the TCGA and GSE44861 dataset, respectively. There were 155 survival‑related genes selected. The co‑expression network of survival‑related genes included 138 genes, 534 lines (connections) and five functional clusters, including the signaling pathway, cellular response to cAMP, and immune system process functional clusters. The molecular function, cellular components and biological processes were the significantly enriched functions. The peroxisome proliferator‑activated receptor signaling pathway, Wnt signaling pathway, B cell receptor signaling pathway, and cytokine‑cytokine receptor interactions were the significant pathways. A prognostic prediction system based on a 65‑gene signature was established using this co‑expression network. Its discriminatory effect was validated in the TCGA dataset (P=3.56e‑12) and the GSE17538 dataset (P=1.67e‑6). The 65‑gene signature included kallikrein‑related peptidase 6 (KLK6), collagen type XI α1 (COL11A1), cartilage

  17. Butterfly, Recurrence, and Predictability in Lorenz Models

    Science.gov (United States)

    Shen, B. W.

    2017-12-01

    Over the span of 50 years, the original three-dimensional Lorenz model (3DLM; Lorenz,1963) and its high-dimensional versions (e.g., Shen 2014a and references therein) have been used for improving our understanding of the predictability of weather and climate with a focus on chaotic responses. Although the Lorenz studies focus on nonlinear processes and chaotic dynamics, people often apply a "linear" conceptual model to understand the nonlinear processes in the 3DLM. In this talk, we present examples to illustrate the common misunderstandings regarding butterfly effect and discuss the importance of solutions' recurrence and boundedness in the 3DLM and high-dimensional LMs. The first example is discussed with the following folklore that has been widely used as an analogy of the butterfly effect: "For want of a nail, the shoe was lost.For want of a shoe, the horse was lost.For want of a horse, the rider was lost.For want of a rider, the battle was lost.For want of a battle, the kingdom was lost.And all for the want of a horseshoe nail."However, in 2008, Prof. Lorenz stated that he did not feel that this verse described true chaos but that it better illustrated the simpler phenomenon of instability; and that the verse implicitly suggests that subsequent small events will not reverse the outcome (Lorenz, 2008). Lorenz's comments suggest that the verse neither describes negative (nonlinear) feedback nor indicates recurrence, the latter of which is required for the appearance of a butterfly pattern. The second example is to illustrate that the divergence of two nearby trajectories should be bounded and recurrent, as shown in Figure 1. Furthermore, we will discuss how high-dimensional LMs were derived to illustrate (1) negative nonlinear feedback that stabilizes the system within the five- and seven-dimensional LMs (5D and 7D LMs; Shen 2014a; 2015a; 2016); (2) positive nonlinear feedback that destabilizes the system within the 6D and 8D LMs (Shen 2015b; 2017); and (3

  18. The predictive value of the 70-gene signature for adjuvant chemotherapy in early breast cancer

    NARCIS (Netherlands)

    Knauer, Michael; Mook, Stella; Rutgers, Emiel J. T.; Bender, Richard A.; Hauptmann, Michael; van de Vijver, Marc J.; Koornstra, Rutger H. T.; Bueno-de-Mesquita, Jolien M.; Linn, Sabine C.; van 't Veer, Laura J.

    2010-01-01

    Multigene assays have been developed and validated to determine the prognosis of breast cancer. In this study, we assessed the additional predictive value of the 70-gene MammaPrint signature for chemotherapy (CT) benefit in addition to endocrine therapy (ET) from pooled study series. For 541

  19. Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data.

    Science.gov (United States)

    Zhu, Mingzhu; Dahmen, Jeremy L; Stacey, Gary; Cheng, Jianlin

    2013-09-22

    High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed. We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature. We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.

  20. Biomine: predicting links between biological entities using network models of heterogeneous databases

    Directory of Open Access Journals (Sweden)

    Eronen Lauri

    2012-06-01

    Full Text Available Abstract Background Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Results Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. Conclusions The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable

  1. Auditing predictive models : a case study in crop growth

    NARCIS (Netherlands)

    Metselaar, K.

    1999-01-01

    Methods were developed to assess and quantify the predictive quality of simulation models, with the intent to contribute to evaluation of model studies by non-scientists. In a case study, two models of different complexity, LINTUL and SUCROS87, were used to predict yield of forage maize

  2. Models for predicting compressive strength and water absorption of ...

    African Journals Online (AJOL)

    This work presents a mathematical model for predicting the compressive strength and water absorption of laterite-quarry dust cement block using augmented Scheffe's simplex lattice design. The statistical models developed can predict the mix proportion that will yield the desired property. The models were tested for lack of ...

  3. The accuracy of survival time prediction for patients with glioma is improved by measuring mitotic spindle checkpoint gene expression.

    Directory of Open Access Journals (Sweden)

    Li Bie

    Full Text Available Identification of gene expression changes that improve prediction of survival time across all glioma grades would be clinically useful. Four Affymetrix GeneChip datasets from the literature, containing data from 771 glioma samples representing all WHO grades and eight normal brain samples, were used in an ANOVA model to screen for transcript changes that correlated with grade. Observations were confirmed and extended using qPCR assays on RNA derived from 38 additional glioma samples and eight normal samples for which survival data were available. RNA levels of eight major mitotic spindle assembly checkpoint (SAC genes (BUB1, BUB1B, BUB3, CENPE, MAD1L1, MAD2L1, CDC20, TTK significantly correlated with glioma grade and six also significantly correlated with survival time. In particular, the level of BUB1B expression was highly correlated with survival time (p<0.0001, and significantly outperformed all other measured parameters, including two standards; WHO grade and MIB-1 (Ki-67 labeling index. Measurement of the expression levels of a small set of SAC genes may complement histological grade and other clinical parameters for predicting survival time.

  4. Statistical and Machine Learning Models to Predict Programming Performance

    OpenAIRE

    Bergin, Susan

    2006-01-01

    This thesis details a longitudinal study on factors that influence introductory programming success and on the development of machine learning models to predict incoming student performance. Although numerous studies have developed models to predict programming success, the models struggled to achieve high accuracy in predicting the likely performance of incoming students. Our approach overcomes this by providing a machine learning technique, using a set of three significant...

  5. HOX Gene Promoter Prediction and Inter-genomic Comparison: An Evo-Devo Study

    Directory of Open Access Journals (Sweden)

    Marla A. Endriga

    2010-10-01

    Full Text Available Homeobox genes direct the anterior-posterior axis of the body plan in eukaryotic organisms. Promoter regions upstream of the Hox genes jumpstart the transcription process. CpG islands found within the promoter regions can cause silencing of these promoters. The locations of the promoter regions and the CpG islands of Homeo sapiens sapiens (human, Pan troglodytes (chimpanzee, Mus musculus (mouse, and Rattus norvegicus (brown rat are compared and related to the possible influence on the specification of the mammalian body plan. The sequence of each gene in Hox clusters A-D of the mammals considered were retrieved from Ensembl and locations of promoter regions and CpG islands predicted using Exon Finder. The predicted promoter sequences were confirmed via BLAST and verified against the Eukaryotic Promoter Database. The significance of the locations was determined using the Kruskal-Wallis test. Among the four clusters, only promoter locations in cluster B showed significant difference. HOX B genes have been linked with the control of genes that direct the development of axial morphology, particularly of the vertebral column bones. The magnitude of variation among the body plans of closely-related species can thus be partially attributed to the promoter kind, location and number, and gene inactivation via CpG methylation.

  6. Effectiveness of gene expression profiling for response prediction of rectal cancer to preoperative radiotherapy

    International Nuclear Information System (INIS)

    Ojima, Eiki; Inoue, Yasuhiro; Miki, Chikao; Kusunoki, Masato; Mori, Masaki

    2007-01-01

    Our aim was to determine whether the expression levels of specific genes could predict clinical radiosensitivity in human colorectal cancer. Radioresistant colorectal cancer cell lines were established by repeated X-ray exposure (total, 100 Gy), and the gene expressions of the parent and radioresistant cell lines were compared in a microarray analysis. To verify the microarray data, we carried out a reverse transcriptase-polymerase chain reaction analysis of identified genes in clinical samples from 30 irradiated rectal cancer patients. A comparison of the intensity data for the parent and three radioresistant cell lines revealed 17 upregulated and 142 downregulated genes in all radioresistant cell lines. Next, we focused on two upregulated genes, PTMA (prothymosin α) and EIF5a2 (eukaryotic translation initiation factor 5A), in the radioresistant cell lines. In clinical samples, the expression of PTMA was significantly higher in the minor effect group than in the major effect group (P=0.004), but there were no significant differences in EIF5a2 expression between the two groups. We identified radiation-related genes in colorectal cancer and demonstrated that PTMA may play an important role in radiosensitivity. Our findings suggest that PTMA may be a novel marker for predicting the effectiveness of radiotherapy in clinical cases. (author)

  7. Paired hormone response elements predict caveolin-1 as a glucocorticoid target gene.

    Directory of Open Access Journals (Sweden)

    Marinus F van Batenburg

    2010-01-01

    Full Text Available Glucocorticoids act in part via glucocorticoid receptor binding to hormone response elements (HREs, but their direct target genes in vivo are still largely unknown. We developed the criterion that genomic occurrence of paired HREs at an inter-HRE distance less than 200 bp predicts hormone responsiveness, based on synergy of multiple HREs, and HRE information from known target genes. This criterion predicts a substantial number of novel responsive genes, when applied to genomic regions 10 kb upstream of genes. Multiple-tissue in situ hybridization showed that mRNA expression of 6 out of 10 selected genes was induced in a tissue-specific manner in mice treated with a single dose of corticosterone, with the spleen being the most responsive organ. Caveolin-1 was strongly responsive in several organs, and the HRE pair in its upstream region showed increased occupancy by glucocorticoid receptor in response to corticosterone. Our approach allowed for discovery of novel tissue specific glucocorticoid target genes, which may exemplify responses underlying the permissive actions of glucocorticoids.

  8. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    Science.gov (United States)

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of EOperon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.

  9. Probabilistic Modeling and Visualization for Bankruptcy Prediction

    DEFF Research Database (Denmark)

    Antunes, Francisco; Ribeiro, Bernardete; Pereira, Francisco Camara

    2017-01-01

    In accounting and finance domains, bankruptcy prediction is of great utility for all of the economic stakeholders. The challenge of accurate assessment of business failure prediction, specially under scenarios of financial crisis, is known to be complicated. Although there have been many successful...... studies on bankruptcy detection, seldom probabilistic approaches were carried out. In this paper we assume a probabilistic point-of-view by applying Gaussian Processes (GP) in the context of bankruptcy prediction, comparing it against the Support Vector Machines (SVM) and the Logistic Regression (LR......). Using real-world bankruptcy data, an in-depth analysis is conducted showing that, in addition to a probabilistic interpretation, the GP can effectively improve the bankruptcy prediction performance with high accuracy when compared to the other approaches. We additionally generate a complete graphical...

  10. Accurate and dynamic predictive model for better prediction in medicine and healthcare.

    Science.gov (United States)

    Alanazi, H O; Abdullah, A H; Qureshi, K N; Ismail, A S

    2018-05-01

    Information and communication technologies (ICTs) have changed the trend into new integrated operations and methods in all fields of life. The health sector has also adopted new technologies to improve the systems and provide better services to customers. Predictive models in health care are also influenced from new technologies to predict the different disease outcomes. However, still, existing predictive models have suffered from some limitations in terms of predictive outcomes performance. In order to improve predictive model performance, this paper proposed a predictive model by classifying the disease predictions into different categories. To achieve this model performance, this paper uses traumatic brain injury (TBI) datasets. TBI is one of the serious diseases worldwide and needs more attention due to its seriousness and serious impacts on human life. The proposed predictive model improves the predictive performance of TBI. The TBI data set is developed and approved by neurologists to set its features. The experiment results show that the proposed model has achieved significant results including accuracy, sensitivity, and specificity.

  11. A new ensemble model for short term wind power prediction

    DEFF Research Database (Denmark)

    Madsen, Henrik; Albu, Razvan-Daniel; Felea, Ioan

    2012-01-01

    As the objective of this study, a non-linear ensemble system is used to develop a new model for predicting wind speed in short-term time scale. Short-term wind power prediction becomes an extremely important field of research for the energy sector. Regardless of the recent advancements in the re-search...... of prediction models, it was observed that different models have different capabilities and also no single model is suitable under all situations. The idea behind EPS (ensemble prediction systems) is to take advantage of the unique features of each subsystem to detain diverse patterns that exist in the dataset...

  12. Testing the predictive power of nuclear mass models

    International Nuclear Information System (INIS)

    Mendoza-Temis, J.; Morales, I.; Barea, J.; Frank, A.; Hirsch, J.G.; Vieyra, J.C. Lopez; Van Isacker, P.; Velazquez, V.

    2008-01-01

    A number of tests are introduced which probe the ability of nuclear mass models to extrapolate. Three models are analyzed in detail: the liquid drop model, the liquid drop model plus empirical shell corrections and the Duflo-Zuker mass formula. If predicted nuclei are close to the fitted ones, average errors in predicted and fitted masses are similar. However, the challenge of predicting nuclear masses in a region stabilized by shell effects (e.g., the lead region) is far more difficult. The Duflo-Zuker mass formula emerges as a powerful predictive tool

  13. Cell-specific prediction and application of drug-induced gene expression profiles.

    Science.gov (United States)

    Hodos, Rachel; Zhang, Ping; Lee, Hao-Chih; Duan, Qiaonan; Wang, Zichen; Clark, Neil R; Ma'ayan, Avi; Wang, Fei; Kidd, Brian; Hu, Jianying; Sontag, David; Dudley, Joel

    2018-01-01

    Gene expression profiling of in vitro drug perturbations is useful for many biomedical discovery applications including drug repurposing and elucidation of drug mechanisms. However, limited data availability across cell types has hindered our capacity to leverage or explore the cell-specificity of these perturbations. While recent efforts have generated a large number of drug perturbation profiles across a variety of human cell types, many gaps remain in this combinatorial drug-cell space. Hence, we asked whether it is possible to fill these gaps by predicting cell-specific drug perturbation profiles using available expression data from related conditions--i.e. from other drugs and cell types. We developed a computational framework that first arranges existing profiles into a three-dimensional array (or tensor) indexed by drugs, genes, and cell types, and then uses either local (nearest-neighbors) or global (tensor completion) information to predict unmeasured profiles. We evaluate prediction accuracy using a variety of metrics, and find that the two methods have complementary performance, each superior in different regions in the drug-cell space. Predictions achieve correlations of 0.68 with true values, and maintain accurate differentially expressed genes (AUC 0.81). Finally, we demonstrate that the predicted profiles add value for making downstream associations with drug targets and therapeutic classes.

  14. Identification of estrogen responsive genes using esophageal squamous cell carcinoma (ESCC) as a model

    KAUST Repository

    Essack, Magbubah

    2012-10-26

    Background: Estrogen therapy has positively impact the treatment of several cancers, such as prostate, lung and breast cancers. Moreover, several groups have reported the importance of estrogen induced gene regulation in esophageal cancer (EC). This suggests that there could be a potential for estrogen therapy for EC. The efficient design of estrogen therapies requires as complete as possible list of genes responsive to estrogen. Our study develops a systems biology methodology using esophageal squamous cell carcinoma (ESCC) as a model to identify estrogen responsive genes. These genes, on the other hand, could be affected by estrogen therapy in ESCC.Results: Based on different sources of information we identified 418 genes implicated in ESCC. Putative estrogen responsive elements (EREs) mapped to the promoter region of the ESCC genes were used to initially identify candidate estrogen responsive genes. EREs mapped to the promoter sequence of 30.62% (128/418) of ESCC genes of which 43.75% (56/128) are known to be estrogen responsive, while 56.25% (72/128) are new candidate estrogen responsive genes. EREs did not map to 290 ESCC genes. Of these 290 genes, 50.34% (146/290) are known to be estrogen responsive. By analyzing transcription factor binding sites (TFBSs) in the promoters of the 202 (56+146) known estrogen responsive ESCC genes under study, we found that their regulatory potential may be characterized by 44 significantly over-represented co-localized TFBSs (cTFBSs). We were able to map these cTFBSs to promoters of 32 of the 72 new candidate estrogen responsive ESCC genes, thereby increasing confidence that these 32 ESCC genes are responsive to estrogen since their promoters contain both: a/mapped EREs, and b/at least four cTFBSs characteristic of ESCC genes that are responsive to estrogen. Recent publications confirm that 47% (15/32) of these 32 predicted genes are indeed responsive to estrogen.Conclusion: To the best of our knowledge our study is the first

  15. Identification of estrogen responsive genes using esophageal squamous cell carcinoma (ESCC) as a model

    KAUST Repository

    Essack, Magbubah; MacPherson, Cameron Ross; Schmeier, Sebastian; Bajic, Vladimir B.

    2012-01-01

    Background: Estrogen therapy has positively impact the treatment of several cancers, such as prostate, lung and breast cancers. Moreover, several groups have reported the importance of estrogen induced gene regulation in esophageal cancer (EC). This suggests that there could be a potential for estrogen therapy for EC. The efficient design of estrogen therapies requires as complete as possible list of genes responsive to estrogen. Our study develops a systems biology methodology using esophageal squamous cell carcinoma (ESCC) as a model to identify estrogen responsive genes. These genes, on the other hand, could be affected by estrogen therapy in ESCC.Results: Based on different sources of information we identified 418 genes implicated in ESCC. Putative estrogen responsive elements (EREs) mapped to the promoter region of the ESCC genes were used to initially identify candidate estrogen responsive genes. EREs mapped to the promoter sequence of 30.62% (128/418) of ESCC genes of which 43.75% (56/128) are known to be estrogen responsive, while 56.25% (72/128) are new candidate estrogen responsive genes. EREs did not map to 290 ESCC genes. Of these 290 genes, 50.34% (146/290) are known to be estrogen responsive. By analyzing transcription factor binding sites (TFBSs) in the promoters of the 202 (56+146) known estrogen responsive ESCC genes under study, we found that their regulatory potential may be characterized by 44 significantly over-represented co-localized TFBSs (cTFBSs). We were able to map these cTFBSs to promoters of 32 of the 72 new candidate estrogen responsive ESCC genes, thereby increasing confidence that these 32 ESCC genes are responsive to estrogen since their promoters contain both: a/mapped EREs, and b/at least four cTFBSs characteristic of ESCC genes that are responsive to estrogen. Recent publications confirm that 47% (15/32) of these 32 predicted genes are indeed responsive to estrogen.Conclusion: To the best of our knowledge our study is the first

  16. Interpreting expression data with metabolic flux models: predicting Mycobacterium tuberculosis mycolic acid production.

    Directory of Open Access Journals (Sweden)

    Caroline Colijn

    2009-08-01

    Full Text Available Metabolism is central to cell physiology, and metabolic disturbances play a role in numerous disease states. Despite its importance, the ability to study metabolism at a global scale using genomic technologies is limited. In principle, complete genome sequences describe the range of metabolic reactions that are possible for an organism, but cannot quantitatively describe the behaviour of these reactions. We present a novel method for modeling metabolic states using whole cell measurements of gene expression. Our method, which we call E-Flux (as a combination of flux and expression, extends the technique of Flux Balance Analysis by modeling maximum flux constraints as a function of measured gene expression. In contrast to previous methods for metabolically interpreting gene expression data, E-Flux utilizes a model of the underlying metabolic network to directly predict changes in metabolic flux capacity. We applied E-Flux to Mycobacterium tuberculosis, the bacterium that causes tuberculosis (TB. Key components of mycobacterial cell walls are mycolic acids which are targets for several first-line TB drugs. We used E-Flux to predict the impact of 75 different drugs, drug combinations, and nutrient conditions on mycolic acid biosynthesis capacity in M. tuberculosis, using a public compendium of over 400 expression arrays. We tested our method using a model of mycolic acid biosynthesis as well as on a genome-scale model of M. tuberculosis metabolism. Our method correctly predicts seven of the eight known fatty acid inhibitors in this compendium and makes accurate predictions regarding the specificity of these compounds for fatty acid biosynthesis. Our method also predicts a number of additional potential modulators of TB mycolic acid biosynthesis. E-Flux thus provides a promising new approach for algorithmically predicting metabolic state from gene expression data.

  17. A Partial Least Square Approach for Modeling Gene-gene and Gene-environment Interactions When Multiple Markers Are Genotyped

    Science.gov (United States)

    Wang, Tao; Ho, Gloria; Ye, Kenny; Strickler, Howard; Elston, Robert C.

    2008-01-01

    Genetic association studies achieve an unprecedented level of resolution in mapping disease genes by genotyping dense SNPs in a gene region. Meanwhile, these studies require new powerful statistical tools that can optimally handle a large amount of information provided by genotype data. A question that arises is how to model interactions between two genes. Simply modeling all possible interactions between the SNPs in two gene regions is not desirable because a greatly increased number of degrees of freedom can be involved in the test statistic. We introduce an approach to reduce the genotype dimension in modeling interactions. The genotype compression of this approach is built upon the information on both the trait and the cross-locus gametic disequilibrium between SNPs in two interacting genes, in such a way as to parsimoniously model the interactions without loss of useful information in the process of dimension reduction. As a result, it improves power to detect association in the presence of gene-gene interactions. This approach can be similarly applied for modeling gene-environment interactions. We compare this method with other approaches: the corresponding test without modeling any interaction, that based on a saturated interaction model, that based on principal component analysis, and that based on Tukey’s 1-df model. Our simulations suggest that this new approach has superior power to that of the other methods. In an application to endometrial cancer case-control data from the Women’s Health Initiative (WHI), this approach detected AKT1 and AKT2 as being significantly associated with endometrial cancer susceptibility by taking into account their interactions with BMI. PMID:18615621

  18. From Predictive Models to Instructional Policies

    Science.gov (United States)

    Rollinson, Joseph; Brunskill, Emma

    2015-01-01

    At their core, Intelligent Tutoring Systems consist of a student model and a policy. The student model captures the state of the student and the policy uses the student model to individualize instruction. Policies require different properties from the student model. For example, a mastery threshold policy requires the student model to have a way…

  19. Prediction of metastasis from low-malignant breast cancer by gene expression profiling

    DEFF Research Database (Denmark)

    Thomassen, Mads; Tan, Qihua; Eiriksdottir, Freyja

    2007-01-01

    examined in these studies is the low-risk patients for whom outcome is very difficult to predict with currently used methods. These patients do not receive adjuvant treatment according to the guidelines of the Danish Breast Cancer Cooperative Group (DBCG). In this study, 26 tumors from low-risk patients...... with different characteristics and risk, expression-based classification specifically developed in low-risk patients have higher predictive power in this group.......Promising results for prediction of outcome in breast cancer have been obtained by genome wide gene expression profiling. Some studies have suggested that an extensive overtreatment of breast cancer patients might be reduced by risk assessment with gene expression profiling. A patient group hardly...

  20. The Complexity of Developmental Predictions from Dual Process Models

    Science.gov (United States)

    Stanovich, Keith E.; West, Richard F.; Toplak, Maggie E.

    2011-01-01

    Drawing developmental predictions from dual-process theories is more complex than is commonly realized. Overly simplified predictions drawn from such models may lead to premature rejection of the dual process approach as one of many tools for understanding cognitive development. Misleading predictions can be avoided by paying attention to several…

  1. A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

    Science.gov (United States)

    Pilkington, Sarah M; Crowhurst, Ross; Hilario, Elena; Nardozza, Simona; Fraser, Lena; Peng, Yongyan; Gunaseelan, Kularajathevan; Simpson, Robert; Tahir, Jibran; Deroles, Simon C; Templeton, Kerry; Luo, Zhiwei; Davy, Marcus; Cheng, Canhong; McNeilage, Mark; Scaglione, Davide; Liu, Yifei; Zhang, Qiong; Datson, Paul; De Silva, Nihal; Gardiner, Susan E; Bassett, Heather; Chagné, David; McCallum, John; Dzierzon, Helge; Deng, Cecilia; Wang, Yen-Yi; Barron, Lorna; Manako, Kelvina; Bowen, Judith; Foster, Toshi M; Erridge, Zoe A; Tiffin, Heather; Waite, Chethi N; Davies, Kevin M; Grierson, Ella P; Laing, William A; Kirk, Rebecca; Chen, Xiuyin; Wood, Marion; Montefiori, Mirco; Brummell, David A; Schwinn, Kathy E; Catanach, Andrew; Fullerton, Christina; Li, Dawei; Meiyalaghan, Sathiyamoorthy; Nieuwenhuizen, Niels; Read, Nicola; Prakash, Roneel; Hunter, Don; Zhang, Huaibi; McKenzie, Marian; Knäbel, Mareike; Harris, Alastair; Allan, Andrew C; Gleave, Andrew; Chen, Angela; Janssen, Bart J; Plunkett, Blue; Ampomah-Dwamena, Charles; Voogd, Charlotte; Leif, Davin; Lafferty, Declan; Souleyre, Edwige J F; Varkonyi-Gasic, Erika; Gambi, Francesco; Hanley, Jenny; Yao, Jia-Long; Cheung, Joey; David, Karine M; Warren, Ben; Marsh, Ken; Snowden, Kimberley C; Lin-Wang, Kui; Brian, Lara; Martinez-Sanchez, Marcela; Wang, Mindy; Ileperuma, Nadeesha; Macnee, Nikolai; Campin, Robert; McAtee, Peter; Drummond, Revel S M; Espley, Richard V; Ireland, Hilary S; Wu, Rongmei; Atkinson, Ross G; Karunairetnam, Sakuntala; Bulley, Sean; Chunkath, Shayhan; Hanley, Zac; Storey, Roy; Thrimawithana, Amali H; Thomson, Susan; David, Charles; Testolin, Raffaele; Huang, Hongwen; Hellens, Roger P; Schaffer, Robert J

    2018-04-16

    Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) 'Hongyang' draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models. A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within 'Hongyang' The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned 'Hort16A' cDNAs and comparing with the predicted protein models for Red5 and both the original 'Hongyang' assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised 'Hongyang' annotation, respectively, compared with 90.9% to the Red5 models. Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and

  2. A Nonlinear Model for Gene-Based Gene-Environment Interaction

    Directory of Open Access Journals (Sweden)

    Jian Sa

    2016-06-01

    Full Text Available A vast amount of literature has confirmed the role of gene-environment (G×E interaction in the etiology of complex human diseases. Traditional methods are predominantly focused on the analysis of interaction between a single nucleotide polymorphism (SNP and an environmental variable. Given that genes are the functional units, it is crucial to understand how gene effects (rather than single SNP effects are influenced by an environmental variable to affect disease risk. Motivated by the increasing awareness of the power of gene-based association analysis over single variant based approach, in this work, we proposed a sparse principle component regression (sPCR model to understand the gene-based G×E interaction effect on complex disease. We first extracted the sparse principal components for SNPs in a gene, then the effect of each principal component was modeled by a varying-coefficient (VC model. The model can jointly model variants in a gene in which their effects are nonlinearly influenced by an environmental variable. In addition, the varying-coefficient sPCR (VC-sPCR model has nice interpretation property since the sparsity on the principal component loadings can tell the relative importance of the corresponding SNPs in each component. We applied our method to a human birth weight dataset in Thai population. We analyzed 12,005 genes across 22 chromosomes and found one significant interaction effect using the Bonferroni correction method and one suggestive interaction. The model performance was further evaluated through simulation studies. Our model provides a system approach to evaluate gene-based G×E interaction.

  3. Sweat loss prediction using a multi-model approach.

    Science.gov (United States)

    Xu, Xiaojiang; Santee, William R

    2011-07-01

    A new multi-model approach (MMA) for sweat loss prediction is proposed to improve prediction accuracy. MMA was computed as the average of sweat loss predicted by two existing thermoregulation models: i.e., the rational model SCENARIO and the empirical model Heat Strain Decision Aid (HSDA). Three independent physiological datasets, a total of 44 trials, were used to compare predictions by MMA, SCENARIO, and HSDA. The observed sweat losses were collected under different combinations of uniform ensembles, environmental conditions (15-40°C, RH 25-75%), and exercise intensities (250-600 W). Root mean square deviation (RMSD), residual plots, and paired t tests were used to compare predictions with observations. Overall, MMA reduced RMSD by 30-39% in comparison with either SCENARIO or HSDA, and increased the prediction accuracy to 66% from 34% or 55%. Of the MMA predictions, 70% fell within the range of mean observed value ± SD, while only 43% of SCENARIO and 50% of HSDA predictions fell within the same range. Paired t tests showed that differences between observations and MMA predictions were not significant, but differences between observations and SCENARIO or HSDA predictions were significantly different for two datasets. Thus, MMA predicted sweat loss more accurately than either of the two single models for the three datasets used. Future work will be to evaluate MMA using additional physiological data to expand the scope of populations and conditions.

  4. Muscle myeloid type I interferon gene expression may predict therapeutic responses to rituximab in myositis patients.

    Science.gov (United States)

    Nagaraju, Kanneboyina; Ghimbovschi, Svetlana; Rayavarapu, Sree; Phadke, Aditi; Rider, Lisa G; Hoffman, Eric P; Miller, Frederick W

    2016-09-01

    To identify muscle gene expression patterns that predict rituximab responses and assess the effects of rituximab on muscle gene expression in PM and DM. In an attempt to understand the molecular mechanism of response and non-response to rituximab therapy, we performed Affymetrix gene expression array analyses on muscle biopsy specimens taken before and after rituximab therapy from eight PM and two DM patients in the Rituximab in Myositis study. We also analysed selected muscle-infiltrating cell phenotypes in these biopsies by immunohistochemical staining. Partek and Ingenuity pathway analyses assessed the gene pathways and networks. Myeloid type I IFN signature genes were expressed at higher levels at baseline in the skeletal muscle of rituximab responders than in non-responders, whereas classic non-myeloid IFN signature genes were expressed at higher levels in non-responders at baseline. Also, rituximab responders have a greater reduction of the myeloid and non-myeloid type I IFN signatures than non-responders. The decrease in the type I IFN signature following administration of rituximab may be associated with the decreases in muscle-infiltrating CD19(+) B cells and CD68(+) macrophages in responders. Our findings suggest that high levels of myeloid type I IFN gene expression in skeletal muscle predict responses to rituximab in PM/DM and that rituximab responders also have a greater decrease in the expression of these genes. These data add further evidence to recent studies defining the type I IFN signature as both a predictor of therapeutic responses and a biomarker of myositis disease activity. Published by Oxford University Press on behalf British Society for Rheumatology 2016. This work is written by US Government employees and is in the public domain in the US.

  5. Comparisons of Faulting-Based Pavement Performance Prediction Models

    Directory of Open Access Journals (Sweden)

    Weina Wang

    2017-01-01

    Full Text Available Faulting prediction is the core of concrete pavement maintenance and design. Highway agencies are always faced with the problem of lower accuracy for the prediction which causes costly maintenance. Although many researchers have developed some performance prediction models, the accuracy of prediction has remained a challenge. This paper reviews performance prediction models and JPCP faulting models that have been used in past research. Then three models including multivariate nonlinear regression (MNLR model, artificial neural network (ANN model, and Markov Chain (MC model are tested and compared using a set of actual pavement survey data taken on interstate highway with varying design features, traffic, and climate data. It is found that MNLR model needs further recalibration, while the ANN model needs more data for training the network. MC model seems a good tool for pavement performance prediction when the data is limited, but it is based on visual inspections and not explicitly related to quantitative physical parameters. This paper then suggests that the further direction for developing the performance prediction model is incorporating the advantages and disadvantages of different models to obtain better accuracy.

  6. A model of gene expression based on random dynamical systems reveals modularity properties of gene regulatory networks.

    Science.gov (United States)

    Antoneli, Fernando; Ferreira, Renata C; Briones, Marcelo R S

    2016-06-01

    Here we propose a new approach to modeling gene expression based on the theory of random dynamical systems (RDS) that provides a general coupling prescription between the nodes of any given regulatory network given the dynamics of each node is modeled by a RDS. The main virtues of this approach are the following: (i) it provides a natural way to obtain arbitrarily large networks by coupling together simple basic pieces, thus revealing the modularity of regulatory networks; (ii) the assumptions about the stochastic processes used in the modeling are fairly general, in the sense that the only requirement is stationarity; (iii) there is a well developed mathematical theory, which is a blend of smooth dynamical systems theory, ergodic theory and stochastic analysis that allows one to extract relevant dynamical and statistical information without solving the system; (iv) one may obtain the classical rate equations form the corresponding stochastic version by averaging the dynamic random variables (small noise limit). It is important to emphasize that unlike the deterministic case, where coupling two equations is a trivial matter, coupling two RDS is non-trivial, specially in our case, where the coupling is performed between a state variable of one gene and the switching stochastic process of another gene and, hence, it is not a priori true that the resulting coupled system will satisfy the definition of a random dynamical system. We shall provide the necessary arguments that ensure that our coupling prescription does indeed furnish a coupled regulatory network of random dynamical systems. Finally, the fact that classical rate equations are the small noise limit of our stochastic model ensures that any validation or prediction made on the basis of the classical theory is also a validation or prediction of our model. We illustrate our framework with some simple examples of single-gene system and network motifs. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Predictive gene signatures: molecular markers distinguishing colon adenomatous polyp and carcinoma.

    Directory of Open Access Journals (Sweden)

    Janice E Drew

    Full Text Available Cancers exhibit abnormal molecular signatures associated with disease initiation and progression. Molecular signatures could improve cancer screening, detection, drug development and selection of appropriate drug therapies for individual patients. Typically only very small amounts of tissue are available from patients for analysis and biopsy samples exhibit broad heterogeneity that cannot be captured using a single marker. This report details application of an in-house custom designed GenomeLab System multiplex gene expression assay, the hCellMarkerPlex, to assess predictive gene signatures of normal, adenomatous polyp and carcinoma colon tissue using archived tissue bank material. The hCellMarkerPlex incorporates twenty-one gene markers: epithelial (EZR, KRT18, NOX1, SLC9A2, proliferation (PCNA, CCND1, MS4A12, differentiation (B4GANLT2, CDX1, CDX2, apoptotic (CASP3, NOX1, NTN1, fibroblast (FSP1, COL1A1, structural (ACTG2, CNN1, DES, gene transcription (HDAC1, stem cell (LGR5, endothelial (VWF and mucin production (MUC2. Gene signatures distinguished normal, adenomatous polyp and carcinoma. Individual gene targets significantly contributing to molecular tissue types, classifier genes, were further characterised using real-time PCR, in-situ hybridisation and immunohistochemistry revealing aberrant epithelial expression of MS4A12, LGR5 CDX2, NOX1 and SLC9A2 prior to development of carcinoma. Identified gene signatures identify aberrant epithelial expression of genes prior to cancer development using in-house custom designed gene expression multiplex assays. This approach may be used to assist in objective classification of disease initiation, staging, progression and therapeutic responses using biopsy material.

  8. Comparison of loline alkaloid gene clusters across fungal endophytes: predicting the co-regulatory sequence motifs and the evolutionary history.

    Science.gov (United States)

    Kutil, Brandi L; Greenwald, Charles; Liu, Gang; Spiering, Martin J; Schardl, Christopher L; Wilkinson, Heather H

    2007-10-01

    LOL, a fungal secondary metabolite gene cluster found in Epichloë and Neotyphodium species, is responsible for production of insecticidal loline alkaloids. To analyze the genetic architecture and to predict the evolutionary history of LOL, we compared five clusters from four fungal species (single clusters from Epichloë festucae, Neotyphodium sp. PauTG-1, Neotyphodium coenophialum, and two clusters we previously characterized in Neotyphodium uncinatum). Using PhyloCon to compare putative lol gene promoter regions, we have identified four motifs conserved across the lol genes in all five clusters. Each motif has significant similarity to known fungal transcription factor binding sites in the TRANSFAC database. Conservation of these motifs is further support for the hypothesis that the lol genes are co-regulated. Interestingly, the history of asexual Neotyphodium spp. includes multiple interspecific hybridization events. Comparing clusters from three Neotyphodium species and E. festucae allowed us to determine which Epichloë ancestors are the most likely contributors of LOL in these asexual species. For example, while no present day Epichloë typhina isolates are known to produce lolines, our data support the hypothesis that the E. typhina ancestor(s) of three asexual endophyte species contained a LOL gene cluster. Thus, these data support a model of evolution in which the polymorphism in loline alkaloid production phenotypes among endophyte species is likely due to the loss of the trait over time.

  9. A Genomics-Based Model for Prediction of Severe Bioprosthetic Mitral Valve Calcification.

    Science.gov (United States)

    Ponasenko, Anastasia V; Khutornaya, Maria V; Kutikhin, Anton G; Rutkovskaya, Natalia V; Tsepokina, Anna V; Kondyukova, Natalia V; Yuzhalin, Arseniy E; Barbarash, Leonid S

    2016-08-31

    Severe bioprosthetic mitral valve calcification is a significant problem in cardiovascular surgery. Unfortunately, clinical markers did not demonstrate efficacy in prediction of severe bioprosthetic mitral valve calcification. Here, we examined whether a genomics-based approach is efficient in predicting the risk of severe bioprosthetic mitral valve calcification. A total of 124 consecutive Russian patients who underwent mitral valve replacement surgery were recruited. We investigated the associations of the inherited variation in innate immunity, lipid metabolism and calcium metabolism genes with severe bioprosthetic mitral valve calcification. Genotyping was conducted utilizing the TaqMan assay. Eight gene polymorphisms were significantly associated with severe bioprosthetic mitral valve calcification and were therefore included into stepwise logistic regression which identified male gender, the T/T genotype of the rs3775073 polymorphism within the TLR6 gene, the C/T genotype of the rs2229238 polymorphism within the IL6R gene, and the A/A genotype of the rs10455872 polymorphism within the LPA gene as independent predictors of severe bioprosthetic mitral valve calcification. The developed genomics-based model had fair predictive value with area under the receiver operating characteristic (ROC) curve of 0.73. In conclusion, our genomics-based approach is efficient for the prediction of severe bioprosthetic mitral valve calcification.

  10. Protein-Protein Interactions Prediction Based on Iterative Clique Extension with Gene Ontology Filtering

    Directory of Open Access Journals (Sweden)

    Lei Yang

    2014-01-01

    Full Text Available Cliques (maximal complete subnets in protein-protein interaction (PPI network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  11. Prediction model of potential hepatocarcinogenicity of rat hepatocarcinogens using a large-scale toxicogenomics database

    International Nuclear Information System (INIS)

    Uehara, Takeki; Minowa, Yohsuke; Morikawa, Yuji; Kondo, Chiaki; Maruyama, Toshiyuki; Kato, Ikuo; Nakatsu, Noriyuki; Igarashi, Yoshinobu; Ono, Atsushi; Hayashi, Hitomi; Mitsumori, Kunitoshi; Yamada, Hiroshi; Ohno, Yasuo; Urushidani, Tetsuro

    2011-01-01

    The present study was performed to develop a robust gene-based prediction model for early assessment of potential hepatocarcinogenicity of chemicals in rats by using our toxicogenomics database, TG-GATEs (Genomics-Assisted Toxicity Evaluation System developed by the Toxicogenomics Project in Japan). The positive training set consisted of high- or middle-dose groups that received 6 different non-genotoxic hepatocarcinogens during a 28-day period. The negative training set consisted of high- or middle-dose groups of 54 non-carcinogens. Support vector machine combined with wrapper-type gene selection algorithms was used for modeling. Consequently, our best classifier yielded prediction accuracies for hepatocarcinogenicity of 99% sensitivity and 97% specificity in the training data set, and false positive prediction was almost completely eliminated. Pathway analysis of feature genes revealed that the mitogen-activated protein kinase p38- and phosphatidylinositol-3-kinase-centered interactome and the v-myc myelocytomatosis viral oncogene homolog-centered interactome were the 2 most significant networks. The usefulness and robustness of our predictor were further confirmed in an independent validation data set obtained from the public database. Interestingly, similar positive predictions were obtained in several genotoxic hepatocarcinogens as well as non-genotoxic hepatocarcinogens. These results indicate that the expression profiles of our newly selected candidate biomarker genes might be common characteristics in the early stage of carcinogenesis for both genotoxic and non-genotoxic carcinogens in the rat liver. Our toxicogenomic model might be useful for the prospective screening of hepatocarcinogenicity of compounds and prioritization of compounds for carcinogenicity testing. - Highlights: →We developed a toxicogenomic model to predict hepatocarcinogenicity of chemicals. →The optimized model consisting of 9 probes had 99% sensitivity and 97% specificity.

  12. Integrating Crop Growth Models with Whole Genome Prediction through Approximate Bayesian Computation.

    Directory of Open Access Journals (Sweden)

    Frank Technow

    Full Text Available Genomic selection, enabled by whole genome prediction (WGP methods, is revolutionizing plant breeding. Existing WGP methods have been shown to deliver accurate predictions in the most common settings, such as prediction of across environment performance for traits with additive gene effects. However, prediction of traits with non-additive gene effects and prediction of genotype by environment interaction (G×E, continues to be challenging. Previous attempts to increase prediction accuracy for these particularly difficult tasks employed prediction methods that are purely statistical in nature. Augmenting the statistical methods with biological knowledge has been largely overlooked thus far. Crop growth models (CGMs attempt to represent the impact of functional relationships between plant physiology and the environment in the formation of yield and similar output traits of interest. Thus, they can explain the impact of G×E and certain types of non-additive gene effects on the expressed phenotype. Approximate Bayesian computation (ABC, a novel and powerful computational procedure, allows the incorporation of CGMs directly into the estimation of whole genome marker effects in WGP. Here we provide a proof of concept study for this novel approach and demonstrate its use with synthetic data sets. We show that this novel approach can be considerably more accurate than the benchmark WGP method GBLUP in predicting performance in environments represented in the estimation set as well as in previously unobserved environments for traits determined by non-additive gene effects. We conclude that this proof of concept demonstrates that using ABC for incorporating biological knowledge in the form of CGMs into WGP is a very promising and novel approach to improving prediction accuracy for some of the most challenging scenarios in plant breeding and applied genetics.

  13. Modeling of Complex Life Cycle Prediction Based on Cell Division

    Directory of Open Access Journals (Sweden)

    Fucheng Zhang

    2017-01-01

    Full Text Available Effective fault diagnosis and reasonable life expectancy are of great significance and practical engineering value for the safety, reliability, and maintenance cost of equipment and working environment. At present, the life prediction methods of the equipment are equipment life prediction based on condition monitoring, combined forecasting model, and driven data. Most of them need to be based on a large amount of data to achieve the problem. For this issue, we propose learning from the mechanism of cell division in the organism. We have established a moderate complexity of life prediction model across studying the complex multifactor correlation life model. In this paper, we model the life prediction of cell division. Experiments show that our model can effectively simulate the state of cell division. Through the model of reference, we will use it for the equipment of the complex life prediction.

  14. Risk prediction model: Statistical and artificial neural network approach

    Science.gov (United States)

    Paiman, Nuur Azreen; Hariri, Azian; Masood, Ibrahim

    2017-04-01

    Prediction models are increasingly gaining popularity and had been used in numerous areas of studies to complement and fulfilled clinical reasoning and decision making nowadays. The adoption of such models assist physician's decision making, individual's behavior, and consequently improve individual outcomes and the cost-effectiveness of care. The objective of this paper is to reviewed articles related to risk prediction model in order to understand the suitable approach, development and the validation process of risk prediction model. A qualitative review of the aims, methods and significant main outcomes of the nineteen published articles that developed risk prediction models from numerous fields were done. This paper also reviewed on how researchers develop and validate the risk prediction models based on statistical and artificial neural network approach. From the review done, some methodological recommendation in developing and validating the prediction model were highlighted. According to studies that had been done, artificial neural network approached in developing the prediction model were more accurate compared to statistical approach. However currently, only limited published literature discussed on which approach is more accurate for risk prediction model development.

  15. Predictive modeling and reducing cyclic variability in autoignition engines

    Science.gov (United States)

    Hellstrom, Erik; Stefanopoulou, Anna; Jiang, Li; Larimore, Jacob

    2016-08-30

    Methods and systems are provided for controlling a vehicle engine to reduce cycle-to-cycle combustion variation. A predictive model is applied to predict cycle-to-cycle combustion behavior of an engine based on observed engine performance variables. Conditions are identified, based on the predicted cycle-to-cycle combustion behavior, that indicate high cycle-to-cycle combustion variation. Corrective measures are then applied to prevent the predicted high cycle-to-cycle combustion variation.

  16. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction

    Directory of Open Access Journals (Sweden)

    Kohlbacher Oliver

    2009-09-01

    Full Text Available Abstract Background Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. Results We extended our previous MultiLoc predictor by incorporating phylogenetic profiles and Gene Ontology terms. Two different datasets were used for training the system, resulting in two versions of this high-accuracy prediction method. One version is specialized for globular proteins and predicts up to five localizations, whereas a second version covers all eleven main eukaryotic subcellular localizations. In a benchmark study with five localizations, MultiLoc2 performs considerably better than other methods for animal and plant proteins and comparably for fungal proteins. Furthermore, MultiLoc2 performs clearly better when using a second dataset that extends the benchmark study to all eleven main eukaryotic subcellular localizations. Conclusion MultiLoc2 is an extensive high-performance subcellular protein localization prediction system. By incorporating phylogenetic profiles and Gene Ontology terms MultiLoc2 yields higher accuracies compared to its previous version. Moreover, it outperforms other prediction systems in two benchmarks studies. MultiLoc2 is available as user-friendly and free web-service, available at: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc2.

  17. Dynamic Simulation of Human Gait Model With Predictive Capability.

    Science.gov (United States)

    Sun, Jinming; Wu, Shaoli; Voglewede, Philip A

    2018-03-01

    In this paper, it is proposed that the central nervous system (CNS) controls human gait using a predictive control approach in conjunction with classical feedback control instead of exclusive classical feedback control theory that controls based on past error. To validate this proposition, a dynamic model of human gait is developed using a novel predictive approach to investigate the principles of the CNS. The model developed includes two parts: a plant model that represents the dynamics of human gait and a controller that represents the CNS. The plant model is a seven-segment, six-joint model that has nine degrees-of-freedom (DOF). The plant model is validated using data collected from able-bodied human subjects. The proposed controller utilizes model predictive control (MPC). MPC uses an internal model to predict the output in advance, compare the predicted output to the reference, and optimize the control input so that the predicted error is minimal. To decrease the complexity of the model, two joints are controlled using a proportional-derivative (PD) controller. The developed predictive human gait model is validated by simulating able-bodied human gait. The simulation results show that the developed model is able to simulate the kinematic output close to experimental data.

  18. A common polymorphism in a Williams syndrome gene predicts amygdala reactivity and extraversion in healthy adults

    Science.gov (United States)

    Swartz, Johnna R.; Waller, Rebecca; Bogdan, Ryan; Knodt, Annchen R.; Sabhlok, Aditi; Hyde, Luke W.; Hariri, Ahmad R.

    2015-01-01

    Background Williams syndrome (WS), a genetic disorder resulting from hemizygous microdeletion of chromosome 7q11.23, has emerged as a model for identifying the genetic architecture of socioemotional behavior. Recently, common polymorphisms in GTF2I, which is found within the WS microdeletion, have been associated with reduced social anxiety in the general population. Identifying neural phenotypes affected by these polymorphisms will help advance our understanding not only of this specific genetic association but also the broader neurogenetic mechanisms of variability in socioemotional behavior. Methods Through an ongoing parent protocol, the Duke Neurogenetics Study, we measured threat-related amygdala reactivity to fearful and angry facial expressions using functional MRI (fMRI), assessed trait personality using the Revised NEO Personality Inventory, and imputed GTF2I rs13227433 from saliva-derived DNA using custom Illumina arrays. Participants included 808 non-Hispanic Caucasian, African American, and Asian university students. Results The GTF2I rs13227433 AA genotype, previously associated with lower social anxiety, predicted decreased threat-related amygdala reactivity. An indirect effect of GTF2I genotype on the warmth facet of extraversion was mediated by decreased threat-related amygdala reactivity in women but not men. Conclusions A common polymorphism in the WS gene GTF2I associated with reduced social anxiety predicts decreased threat-related amygdala reactivity, which mediates an association between genotype and increased warmth in women. These results are consistent with reduced threat-related amygdala reactivity in WS and suggest that common variation in GTF2I contributes to broader variability in socioemotional brain function and behavior, with implications for understanding the neurogenetic bases of WS as well as social anxiety. PMID:26853120

  19. MO-DE-207B-05: Predicting Gene Mutations in Renal Cell Carcinoma Based On CT Imaging Features: Validation Using TCGA-TCIA Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Chen, X; Zhou, Z; Thomas, K; Wang, J [UT Southwestern Medical Center, Dallas, TX (United States)

    2016-06-15

    Purpose: The goal of this work is to investigate the use of contrast enhanced computed tomographic (CT) features for the prediction of mutations of BAP1, PBRM1, and VHL genes in renal cell carcinoma (RCC). Methods: For this study, we used two patient databases with renal cell carcinoma (RCC). The first one consisted of 33 patients from our institution (UT Southwestern Medical Center, UTSW). The second one consisted of 24 patients from the Cancer Imaging Archive (TCIA), where each patient is connected by a unique identi?er to the tissue samples from the Cancer Genome Atlas (TCGA). From the contrast enhanced CT image of each patient, tumor contour was first delineated by a physician. Geometry, intensity, and texture features were extracted from the delineated tumor. Based on UTSW dataset, we completed feature selection and trained a support vector machine (SVM) classifier to predict mutations of BAP1, PBRM1 and VHL genes. We then used TCIA-TCGA dataset to validate the predictive model build upon UTSW dataset. Results: The prediction accuracy of gene expression of TCIA-TCGA patients was 0.83 (20 of 24), 0.83 (20 of 24), and 0.75 (18 of 24) for BAP1, PBRM1, and VHL respectively. For BAP1 gene, texture feature was the most prominent feature type. For PBRM1 gene, intensity feature was the most prominent. For VHL gene, geometry, intensity, and texture features were all important. Conclusion: Using our feature selection strategy and models, we achieved predictive accuracy over 0.75 for all three genes under the condition of using patient data from one institution for training and data from other institutions for testing. These results suggest that radiogenomics can be used to aid in prognosis and used as convenient surrogates for expensive and time consuming gene assay procedures.

  20. Comparative Evaluation of Some Crop Yield Prediction Models ...

    African Journals Online (AJOL)

    A computer program was adopted from the work of Hill et al. (1982) to calibrate and test three of the existing yield prediction models using tropical cowpea yieldÐweather data. The models tested were Hanks Model (first and second versions). Stewart Model (first and second versions) and HallÐButcher Model. Three sets of ...

  1. Gene Expression Analysis to Assess the Relevance of Rodent Models to Human Lung Injury.

    Science.gov (United States)

    Sweeney, Timothy E; Lofgren, Shane; Khatri, Purvesh; Rogers, Angela J

    2017-08-01

    The relevance of animal models to human diseases is an area of intense scientific debate. The degree to which mouse models of lung injury recapitulate human lung injury has never been assessed. Integrating data from both human and animal expression studies allows for increased statistical power and identification of conserved differential gene expression across organisms and conditions. We sought comprehensive integration of gene expression data in experimental acute lung injury (ALI) in rodents compared with humans. We performed two separate gene expression multicohort analyses to determine differential gene expression in experimental animal and human lung injury. We used correlational and pathway analyses combined with external in vitro gene expression data to identify both potential drivers of underlying inflammation and therapeutic drug candidates. We identified 21 animal lung tissue datasets and three human lung injury bronchoalveolar lavage datasets. We show that the metasignatures of animal and human experimental ALI are significantly correlated despite these widely varying experimental conditions. The gene expression changes among mice and rats across diverse injury models (ozone, ventilator-induced lung injury, LPS) are significantly correlated with human models of lung injury (Pearson r = 0.33-0.45, P human lung injury. Predicted therapeutic targets, peptide ligand signatures, and pathway analyses are also all highly overlapping. Gene expression changes are similar in animal and human experimental ALI, and provide several physiologic and therapeutic insights to the disease.

  2. A model to predict the power output from wind farms

    Energy Technology Data Exchange (ETDEWEB)

    Landberg, L. [Riso National Lab., Roskilde (Denmark)

    1997-12-31

    This paper will describe a model that can predict the power output from wind farms. To give examples of input the model is applied to a wind farm in Texas. The predictions are generated from forecasts from the NGM model of NCEP. These predictions are made valid at individual sites (wind farms) by applying a matrix calculated by the sub-models of WASP (Wind Atlas Application and Analysis Program). The actual wind farm production is calculated using the Riso PARK model. Because of the preliminary nature of the results, they will not be given. However, similar results from Europe will be given.

  3. Modelling microbial interactions and food structure in predictive microbiology

    NARCIS (Netherlands)

    Malakar, P.K.

    2002-01-01

    Keywords: modelling, dynamic models, microbial interactions, diffusion, microgradients, colony growth, predictive microbiology.

    Growth response of microorganisms in foods is a complex process. Innovations in food production and preservation techniques have resulted in adoption of

  4. Ocean wave prediction using numerical and neural network models

    Digital Repository Service at National Institute of Oceanography (India)

    Mandal, S.; Prabaharan, N.

    This paper presents an overview of the development of the numerical wave prediction models and recently used neural networks for ocean wave hindcasting and forecasting. The numerical wave models express the physical concepts of the phenomena...

  5. Adipose Gene Expression Prior to Weight Loss Can Differentiate and Weakly Predict Dietary Responders

    Science.gov (United States)

    Mutch, David M.; Temanni, M. Ramzi; Henegar, Corneliu; Combes, Florence; Pelloux, Véronique; Holst, Claus; Sørensen, Thorkild I. A.; Astrup, Arne; Martinez, J. Alfredo; Saris, Wim H. M.; Viguerie, Nathalie; Langin, Dominique; Zucker, Jean-Daniel; Clément, Karine

    2007-01-01

    Background The ability to identify obese individuals who will successfully lose weight in response to dietary intervention will revolutionize disease management. Therefore, we asked whether it is possible to identify subjects who will lose weight during dietary intervention using only a single gene expression snapshot. Methodology/Principal Findings The present study involved 54 female subjects from the Nutrient-Gene Interactions in Human Obesity-Implications for Dietary Guidelines (NUGENOB) trial to determine whether subcutaneous adipose tissue gene expression could be used to predict weight loss prior to the 10-week consumption of a low-fat hypocaloric diet. Using several statistical tests revealed that the gene expression profiles of responders (8–12 kgs weight loss) could always be differentiated from non-responders (diet is able to differentiate responders from non-responders as well as serve as a weak predictor of subjects destined to lose weight. While the degree of prediction accuracy currently achieved with a gene expression snapshot is perhaps insufficient for clinical use, this work reveals that the comprehensive molecular signature of adipose tissue paves the way for the future of personalized nutrition. PMID:18094752

  6. A mathematical model for predicting earthquake occurrence ...

    African Journals Online (AJOL)

    We consider the continental crust under damage. We use the observed results of microseism in many seismic stations of the world which was established to study the time series of the activities of the continental crust with a view to predicting possible time of occurrence of earthquake. We consider microseism time series ...

  7. Model for predicting the injury severity score.

    Science.gov (United States)

    Hagiwara, Shuichi; Oshima, Kiyohiro; Murata, Masato; Kaneko, Minoru; Aoki, Makoto; Kanbe, Masahiko; Nakamura, Takuro; Ohyama, Yoshio; Tamura, Jun'ichi

    2015-07-01

    To determine the formula that predicts the injury severity score from parameters that are obtained in the emergency department at arrival. We reviewed the medical records of trauma patients who were transferred to the emergency department of Gunma University Hospital between January 2010 and December 2010. The injury severity score, age, mean blood pressure, heart rate, Glasgow coma scale, hemoglobin, hematocrit, red blood cell count, platelet count, fibrinogen, international normalized ratio of prothrombin time, activated partial thromboplastin time, and fibrin degradation products, were examined in those patients on arrival. To determine the formula that predicts the injury severity score, multiple linear regression analysis was carried out. The injury severity score was set as the dependent variable, and the other parameters were set as candidate objective variables. IBM spss Statistics 20 was used for the statistical analysis. Statistical significance was set at P  Watson ratio was 2.200. A formula for predicting the injury severity score in trauma patients was developed with ordinary parameters such as fibrin degradation products and mean blood pressure. This formula is useful because we can predict the injury severity score easily in the emergency department.

  8. Predicting Career Advancement with Structural Equation Modelling

    Science.gov (United States)

    Heimler, Ronald; Rosenberg, Stuart; Morote, Elsa-Sofia

    2012-01-01

    Purpose: The purpose of this paper is to use the authors' prior findings concerning basic employability skills in order to determine which skills best predict career advancement potential. Design/methodology/approach: Utilizing survey responses of human resource managers, the employability skills showing the largest relationships to career…

  9. Genome sequence analysis of predicted polyprenol reductase gene from mangrove plant kandelia obovata

    Science.gov (United States)

    Basyuni, M.; Sagami, H.; Baba, S.; Oku, H.

    2018-03-01

    It has been previously reported that dolichols but not polyprenols were predominated in mangrove leaves and roots. Therefore, the occurrence of larger amounts of dolichol in leaves of mangrove plants implies that polyprenol reductase is responsible for the conversion of polyprenol to dolichol may be active in mangrove leaves. Here we report the early assessment of probably polyprenol reductase gene from genome sequence of mangrove plant Kandelia obovata. The functional assignment of the gene was based on a homology search of the sequences against the non-redundant (nr) peptide database of NCBI using Blastx. The degree of sequence identity between DNA sequence and known polyprenol reductase was confirmed using the Blastx probability E-value, total score, and identity. The genome sequence data resulted in three partial sequences, termed c23157 (700 bp), c23901 (960 bp), and c24171 (531 bp). The c23157 gene showed the highest similarity (61%) to predicted polyprenol reductase 2- like from Gossypium raimondii with E-value 2e-100. The second gene was c23901 to exhibit high similarity (78%) to the steroid 5-alpha-reductase Det2 from J. curcas with E-value 2e-140. Furthermore, the c24171 gene depicted highest similarity (79%) to the polyprenol reductase 2 isoform X1 from Jatropha curcas with E- value 7e-21.The present study suggested that the c23157, c23901, and c24171, genes may encode predicted polyprenol reductase. The c23157, c23901, c24171 are therefore the new type of predicted polyprenol reductase from K. obovata.

  10. Using gene co-expression network analysis to predict biomarkers for chronic lymphocytic leukemia

    Directory of Open Access Journals (Sweden)

    Borlawsky Tara B

    2010-10-01

    Full Text Available Abstract Background Chronic lymphocytic leukemia (CLL is the most common adult leukemia. It is a highly heterogeneous disease, and can be divided roughly into indolent and progressive stages based on classic clinical markers. Immunoglobin heavy chain variable region (IgVH mutational status was found to be associated with patient survival outcome, and biomarkers linked to the IgVH status has been a focus in the CLL prognosis research field. However, biomarkers highly correlated with IgVH mutational status which can accurately predict the survival outcome are yet to be discovered. Results In this paper, we investigate the use of gene co-expression network analysis to identify potential biomarkers for CLL. Specifically we focused on the co-expression network involving ZAP70, a well characterized biomarker for CLL. We selected 23 microarray datasets corresponding to multiple types of cancer from the Gene Expression Omnibus (GEO and used the frequent network mining algorithm CODENSE to identify highly connected gene co-expression networks spanning the entire genome, then evaluated the genes in the co-expression network in which ZAP70 is involved. We then applied a set of feature selection methods to further select genes which are capable of predicting IgVH mutation status from the ZAP70 co-expression network. Conclusions We have identified a set of genes that are potential CLL prognostic biomarkers IL2RB, CD8A, CD247, LAG3 and KLRK1, which can predict CLL patient IgVH mutational status with high accuracies. Their prognostic capabilities were cross-validated by applying these biomarker candidates to classify patients into different outcome groups using a CLL microarray datasets with clinical information.

  11. A deep learning-based multi-model ensemble method for cancer prediction.

    Science.gov (United States)

    Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong

    2018-01-01

    Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Methylation of cancer-stem-cell-associated Wnt target genes predicts poor prognosis in colorectal cancer patients

    NARCIS (Netherlands)

    de Sousa E Melo, Felipe; Colak, Selcuk; Buikhuisen, Joyce; Koster, Jan; Cameron, Kate; de Jong, Joan H.; Tuynman, Jurriaan B.; Prasetyanti, Pramudita R.; Fessler, Evelyn; van den Bergh, Saskia P.; Rodermond, Hans; Dekker, Evelien; van der Loos, Chris M.; Pals, Steven T.; van de Vijver, Marc J.; Versteeg, Rogier; Richel, Dick J.; Vermeulen, Louis; Medema, Jan Paul

    2011-01-01

    Gene signatures derived from cancer stem cells (CSCs) predict tumor recurrence for many forms of cancer. Here, we derived a gene signature for colorectal CSCs defined by high Wnt signaling activity, which in agreement with previous observations predicts poor prognosis. Surprisingly, however, we

  13. Statistical model based gender prediction for targeted NGS clinical panels

    Directory of Open Access Journals (Sweden)

    Palani Kannan Kandavel

    2017-12-01

    The reference test dataset are being used to test the model. The sensitivity on predicting the gender has been increased from the current “genotype composition in ChrX” based approach. In addition, the prediction score given by the model can be used to evaluate the quality of clinical dataset. The higher prediction score towards its respective gender indicates the higher quality of sequenced data.

  14. A predictive pilot model for STOL aircraft landing

    Science.gov (United States)

    Kleinman, D. L.; Killingsworth, W. R.

    1974-01-01

    An optimal control approach has been used to model pilot performance during STOL flare and landing. The model is used to predict pilot landing performance for three STOL configurations, each having a different level of automatic control augmentation. Model predictions are compared with flight simulator data. It is concluded that the model can be effective design tool for studying analytically the effects of display modifications, different stability augmentation systems, and proposed changes in the landing area geometry.

  15. Estimating Model Prediction Error: Should You Treat Predictions as Fixed or Random?

    Science.gov (United States)

    Wallach, Daniel; Thorburn, Peter; Asseng, Senthold; Challinor, Andrew J.; Ewert, Frank; Jones, James W.; Rotter, Reimund; Ruane, Alexander

    2016-01-01

    Crop models are important tools for impact assessment of climate change, as well as for exploring management options under current climate. It is essential to evaluate the uncertainty associated with predictions of these models. We compare two criteria of prediction error; MSEP fixed, which evaluates mean squared error of prediction for a model with fixed structure, parameters and inputs, and MSEP uncertain( X), which evaluates mean squared error averaged over the distributions of model structure, inputs and parameters. Comparison of model outputs with data can be used to estimate the former. The latter has a squared bias term, which can be estimated using hindcasts, and a model variance term, which can be estimated from a simulation experiment. The separate contributions to MSEP uncertain (X) can be estimated using a random effects ANOVA. It is argued that MSEP uncertain (X) is the more informative uncertainty criterion, because it is specific to each prediction situation.

  16. Model-based uncertainty in species range prediction

    DEFF Research Database (Denmark)

    Pearson, R. G.; Thuiller, Wilfried; Bastos Araujo, Miguel

    2006-01-01

    Aim Many attempts to predict the potential range of species rely on environmental niche (or 'bioclimate envelope') modelling, yet the effects of using different niche-based methodologies require further investigation. Here we investigate the impact that the choice of model can have on predictions...

  17. Wind turbine control and model predictive control for uncertain systems

    DEFF Research Database (Denmark)

    Thomsen, Sven Creutz

    as disturbance models for controller design. The theoretical study deals with Model Predictive Control (MPC). MPC is an optimal control method which is characterized by the use of a receding prediction horizon. MPC has risen in popularity due to its inherent ability to systematically account for time...

  18. Testing and analysis of internal hardwood log defect prediction models

    Science.gov (United States)

    R. Edward Thomas

    2011-01-01

    The severity and location of internal defects determine the quality and value of lumber sawn from hardwood logs. Models have been developed to predict the size and position of internal defects based on external defect indicator measurements. These models were shown to predict approximately 80% of all internal knots based on external knot indicators. However, the size...

  19. Comparison of Simple Versus Performance-Based Fall Prediction Models

    Directory of Open Access Journals (Sweden)

    Shekhar K. Gadkaree BS

    2015-05-01

    Full Text Available Objective: To compare the predictive ability of standard falls prediction models based on physical performance assessments with more parsimonious prediction models based on self-reported data. Design: We developed a series of fall prediction models progressing in complexity and compared area under the receiver operating characteristic curve (AUC across models. Setting: National Health and Aging Trends Study (NHATS, which surveyed a nationally representative sample of Medicare enrollees (age ≥65 at baseline (Round 1: 2011-2012 and 1-year follow-up (Round 2: 2012-2013. Participants: In all, 6,056 community-dwelling individuals participated in Rounds 1 and 2 of NHATS. Measurements: Primary outcomes were 1-year incidence of “ any fall ” and “ recurrent falls .” Prediction models were compared and validated in development and validation sets, respectively. Results: A prediction model that included demographic information, self-reported problems with balance and coordination, and previous fall history was the most parsimonious model that optimized AUC for both any fall (AUC = 0.69, 95% confidence interval [CI] = [0.67, 0.71] and recurrent falls (AUC = 0.77, 95% CI = [0.74, 0.79] in the development set. Physical performance testing provided a marginal additional predictive value. Conclusion: A simple clinical prediction model that does not include physical performance testing could facilitate routine, widespread falls risk screening in the ambulatory care setting.

  20. Models for predicting fuel consumption in sagebrush-dominated ecosystems

    Science.gov (United States)

    Clinton S. Wright

    2013-01-01

    Fuel consumption predictions are necessary to accurately estimate or model fire effects, including pollutant emissions during wildland fires. Fuel and environmental measurements on a series of operational prescribed fires were used to develop empirical models for predicting fuel consumption in big sagebrush (Artemisia tridentate Nutt.) ecosystems....

  1. Refining the Committee Approach and Uncertainty Prediction in Hydrological Modelling

    NARCIS (Netherlands)

    Kayastha, N.

    2014-01-01

    Due to the complexity of hydrological systems a single model may be unable to capture the full range of a catchment response and accurately predict the streamflows. The multi modelling approach opens up possibilities for handling such difficulties and allows improve the predictive capability of

  2. A new, accurate predictive model for incident hypertension

    DEFF Research Database (Denmark)

    Völzke, Henry; Fung, Glenn; Ittermann, Till

    2013-01-01

    Data mining represents an alternative approach to identify new predictors of multifactorial diseases. This work aimed at building an accurate predictive model for incident hypertension using data mining procedures.......Data mining represents an alternative approach to identify new predictors of multifactorial diseases. This work aimed at building an accurate predictive model for incident hypertension using data mining procedures....

  3. Prediction models for successful external cephalic version: a systematic review

    NARCIS (Netherlands)

    Velzel, Joost; de Hundt, Marcella; Mulder, Frederique M.; Molkenboer, Jan F. M.; van der Post, Joris A. M.; Mol, Ben W.; Kok, Marjolein

    2015-01-01

    To provide an overview of existing prediction models for successful ECV, and to assess their quality, development and performance. We searched MEDLINE, EMBASE and the Cochrane Library to identify all articles reporting on prediction models for successful ECV published from inception to January 2015.

  4. Hidden Markov Model for quantitative prediction of snowfall

    Indian Academy of Sciences (India)

    A Hidden Markov Model (HMM) has been developed for prediction of quantitative snowfall in Pir-Panjal and Great Himalayan mountain ranges of Indian Himalaya. The model predicts snowfall for two days in advance using daily recorded nine meteorological variables of past 20 winters from 1992–2012. There are six ...

  5. Mathematical model for dissolved oxygen prediction in Cirata ...

    African Journals Online (AJOL)

    This paper presents the implementation and performance of mathematical model to predict theconcentration of dissolved oxygen in Cirata Reservoir, West Java by using Artificial Neural Network (ANN). The simulation program was created using Visual Studio 2012 C# software with ANN model implemented in it. Prediction ...

  6. Econometric models for predicting confusion crop ratios

    Science.gov (United States)

    Umberger, D. E.; Proctor, M. H.; Clark, J. E.; Eisgruber, L. M.; Braschler, C. B. (Principal Investigator)

    1979-01-01

    Results for both the United States and Canada show that econometric models can provide estimates of confusion crop ratios that are more accurate than historical ratios. Whether these models can support the LACIE 90/90 accuracy criterion is uncertain. In the United States, experimenting with additional model formulations could provide improved methods models in some CRD's, particularly in winter wheat. Improved models may also be possible for the Canadian CD's. The more aggressive province/state models outperformed individual CD/CRD models. This result was expected partly because acreage statistics are based on sampling procedures, and the sampling precision declines from the province/state to the CD/CRD level. Declining sampling precision and the need to substitute province/state data for the CD/CRD data introduced measurement error into the CD/CRD models.

  7. PEEX Modelling Platform for Seamless Environmental Prediction

    Science.gov (United States)

    Baklanov, Alexander; Mahura, Alexander; Arnold, Stephen; Makkonen, Risto; Petäjä, Tuukka; Kerminen, Veli-Matti; Lappalainen, Hanna K.; Ezau, Igor; Nuterman, Roman; Zhang, Wen; Penenko, Alexey; Gordov, Evgeny; Zilitinkevich, Sergej; Kulmala, Markku

    2017-04-01

    The Pan-Eurasian EXperiment (PEEX) is a multidisciplinary, multi-scale research programme stared in 2012 and aimed at resolving the major uncertainties in Earth System Science and global sustainability issues concerning the Arctic and boreal Northern Eurasian regions and in China. Such challenges include climate change, air quality, biodiversity loss, chemicalization, food supply, and the use of natural resources by mining, industry, energy production and transport. The research infrastructure introduces the current state of the art modeling platform and observation systems in the Pan-Eurasian region and presents the future baselines for the coherent and coordinated research infrastructures in the PEEX domain. The PEEX modeling Platform is characterized by a complex seamless integrated Earth System Modeling (ESM) approach, in combination with specific models of different processes and elements of the system, acting on different temporal and spatial scales. The ensemble approach is taken to the integration of modeling results from different models, participants and countries. PEEX utilizes the full potential of a hierarchy of models: scenario analysis, inverse modeling, and modeling based on measurement needs and processes. The models are validated and constrained by available in-situ and remote sensing data of various spatial and temporal scales using data assimilation and top-down modeling. The analyses of the anticipated large volumes of data produced by available models and sensors will be supported by a dedicated virtual research environment developed for these purposes.

  8. Statistical modelling of transcript profiles of differentially regulated genes

    Directory of Open Access Journals (Sweden)

    Sergeant Martin J

    2008-07-01

    Full Text Available Abstract Background The vast quantities of gene expression profiling data produced in microarray studies, and the more precise quantitative PCR, are often not statistically analysed to their full potential. Previous studies have summarised gene expression profiles using simple descriptive statistics, basic analysis of variance (ANOVA and the clustering of genes based on simple models fitted to their expression profiles over time. We report the novel application of statistical non-linear regression modelling techniques to describe the shapes of expression profiles for the fungus Agaricus bisporus, quantified by PCR, and for E. coli and Rattus norvegicus, using microarray technology. The use of parametric non-linear regression models provides a more precise description of expression profiles, reducing the "noise" of the raw data to produce a clear "signal" given by the fitted curve, and describing each profile with a small number of biologically interpretable parameters. This approach then allows the direct comparison and clustering of the shapes of response patterns between genes and potentially enables a greater exploration and interpretation of the biological processes driving gene expression. Results Quantitative reverse transcriptase PCR-derived time-course data of genes were modelled. "Split-line" or "broken-stick" regression identified the initial time of gene up-regulation, enabling the classification of genes into those with primary and secondary responses. Five-day profiles were modelled using the biologically-oriented, critical exponential curve, y(t = A + (B + CtRt + ε. This non-linear regression approach allowed the expression patterns for different genes to be compared in terms of curve shape, time of maximal transcript level and the decline and asymptotic response levels. Three distinct regulatory patterns were identified for the five genes studied. Applying the regression modelling approach to microarray-derived time course data

  9. Impact of modellers' decisions on hydrological a priori predictions

    Science.gov (United States)

    Holländer, H. M.; Bormann, H.; Blume, T.; Buytaert, W.; Chirico, G. B.; Exbrayat, J.-F.; Gustafsson, D.; Hölzel, H.; Krauße, T.; Kraft, P.; Stoll, S.; Blöschl, G.; Flühler, H.

    2014-06-01

    In practice, the catchment hydrologist is often confronted with the task of predicting discharge without having the needed records for calibration. Here, we report the discharge predictions of 10 modellers - using the model of their choice - for the man-made Chicken Creek catchment (6 ha, northeast Germany, Gerwin et al., 2009b) and we analyse how well they improved their prediction in three steps based on adding information prior to each following step. The modellers predicted the catchment's hydrological response in its initial phase without having access to the observed records. They used conceptually different physically based models and their modelling experience differed largely. Hence, they encountered two problems: (i) to simulate discharge for an ungauged catchment and (ii) using models that were developed for catchments, which are not in a state of landscape transformation. The prediction exercise was organized in three steps: (1) for the first prediction the modellers received a basic data set describing the catchment to a degree somewhat more complete than usually available for a priori predictions of ungauged catchments; they did not obtain information on stream flow, soil moisture, nor groundwater response and had therefore to guess the initial conditions; (2) before the second prediction they inspected the catchment on-site and discussed their first prediction attempt; (3) for their third prediction they were offered additional data by charging them pro forma with the costs for obtaining this additional information. Holländer et al. (2009) discussed the range of predictions obtained in step (1). Here, we detail the modeller's assumptions and decisions in accounting for the various processes. We document the prediction progress as well as the learning process resulting from the availability of added information. For the second and third steps, the progress in prediction quality is evaluated in relation to individual modelling experience and costs of

  10. Adding propensity scores to pure prediction models fails to improve predictive performance

    Directory of Open Access Journals (Sweden)

    Amy S. Nowacki

    2013-08-01

    Full Text Available Background. Propensity score usage seems to be growing in popularity leading researchers to question the possible role of propensity scores in prediction modeling, despite the lack of a theoretical rationale. It is suspected that such requests are due to the lack of differentiation regarding the goals of predictive modeling versus causal inference modeling. Therefore, the purpose of this study is to formally examine the effect of propensity scores on predictive performance. Our hypothesis is that a multivariable regression model that adjusts for all covariates will perform as well as or better than those models utilizing propensity scores with respect to model discrimination and calibration.Methods. The most commonly encountered statistical scenarios for medical prediction (logistic and proportional hazards regression were used to investigate this research question. Random cross-validation was performed 500 times to correct for optimism. The multivariable regression models adjusting for all covariates were compared with models that included adjustment for or weighting with the propensity scores. The methods were compared based on three predictive performance measures: (1 concordance indices; (2 Brier scores; and (3 calibration curves.Results. Multivariable models adjusting for all covariates had the highest average concordance index, the lowest average Brier score, and the best calibration. Propensity score adjustment and inverse probability weighting models without adjustment for all covariates performed worse than full models and failed to improve predictive performance with full covariate adjustment.Conclusion. Propensity score techniques did not improve prediction performance measures beyond multivariable adjustment. Propensity scores are not recommended if the analytical goal is pure prediction modeling.

  11. NOx PREDICTION FOR FBC BOILERS USING EMPIRICAL MODELS

    Directory of Open Access Journals (Sweden)

    Jiří Štefanica

    2014-02-01

    Full Text Available Reliable prediction of NOx emissions can provide useful information for boiler design and fuel selection. Recently used kinetic prediction models for FBC boilers are overly complex and require large computing capacity. Even so, there are many uncertainties in the case of FBC boilers. An empirical modeling approach for NOx prediction has been used exclusively for PCC boilers. No reference is available for modifying this method for FBC conditions. This paper presents possible advantages of empirical modeling based prediction of NOx emissions for FBC boilers, together with a discussion of its limitations. Empirical models are reviewed, and are applied to operation data from FBC boilers used for combusting Czech lignite coal or coal-biomass mixtures. Modifications to the model are proposed in accordance with theoretical knowledge and prediction accuracy.

  12. Complex versus simple models: ion-channel cardiac toxicity prediction.

    Science.gov (United States)

    Mistry, Hitesh B

    2018-01-01

    There is growing interest in applying detailed mathematical models of the heart for ion-channel related cardiac toxicity prediction. However, a debate as to whether such complex models are required exists. Here an assessment in the predictive performance between two established large-scale biophysical cardiac models and a simple linear model B net was conducted. Three ion-channel data-sets were extracted from literature. Each compound was designated a cardiac risk category using two different classification schemes based on information within CredibleMeds. The predictive performance of each model within each data-set for each classification scheme was assessed via a leave-one-out cross validation. Overall the B net model performed equally as well as the leading cardiac models in two of the data-sets and outperformed both cardiac models on the latest. These results highlight the importance of benchmarking complex versus simple models but also encourage the development of simple models.

  13. Complex versus simple models: ion-channel cardiac toxicity prediction

    Directory of Open Access Journals (Sweden)

    Hitesh B. Mistry

    2018-02-01

    Full Text Available There is growing interest in applying detailed mathematical models of the heart for ion-channel related cardiac toxicity prediction. However, a debate as to whether such complex models are required exists. Here an assessment in the predictive performance between two established large-scale biophysical cardiac models and a simple linear model Bnet was conducted. Three ion-channel data-sets were extracted from literature. Each compound was designated a cardiac risk category using two different classification schemes based on information within CredibleMeds. The predictive performance of each model within each data-set for each classification scheme was assessed via a leave-one-out cross validation. Overall the Bnet model performed equally as well as the leading cardiac models in two of the data-sets and outperformed both cardiac models on the latest. These results highlight the importance of benchmarking complex versus simple models but also encourage the development of simple models.

  14. Fixed recurrence and slip models better predict earthquake behavior than the time- and slip-predictable models 1: repeating earthquakes

    Science.gov (United States)

    Rubinstein, Justin L.; Ellsworth, William L.; Chen, Kate Huihsuan; Uchida, Naoki

    2012-01-01

    The behavior of individual events in repeating earthquake sequences in California, Taiwan and Japan is better predicted by a model with fixed inter-event time or fixed slip than it is by the time- and slip-predictable models for earthquake occurrence. Given that repeating earthquakes are highly regular in both inter-event time and seismic moment, the time- and slip-predictable models seem ideally suited to explain their behavior. Taken together with evidence from the companion manuscript that shows similar results for laboratory experiments we conclude that the short-term predictions of the time- and slip-predictable models should be rejected in favor of earthquake models that assume either fixed slip or fixed recurrence interval. This implies that the elastic rebound model underlying the time- and slip-predictable models offers no additional value in describing earthquake behavior in an event-to-event sense, but its value in a long-term sense cannot be determined. These models likely fail because they rely on assumptions that oversimplify the earthquake cycle. We note that the time and slip of these events is predicted quite well by fixed slip and fixed recurrence models, so in some sense they are time- and slip-predictable. While fixed recurrence and slip models better predict repeating earthquake behavior than the time- and slip-predictable models, we observe a correlation between slip and the preceding recurrence time for many repeating earthquake sequences in Parkfield, California. This correlation is not found in other regions, and the sequences with the correlative slip-predictable behavior are not distinguishable from nearby earthquake sequences that do not exhibit this behavior.

  15. A copula method for modeling directional dependence of genes

    Directory of Open Access Journals (Sweden)

    Park Changyi

    2008-05-01

    Full Text Available Abstract Background Genes interact with each other as basic building blocks of life, forming a complicated network. The relationship between groups of genes with different functions can be represented as gene networks. With the deposition of huge microarray data sets in public domains, study on gene networking is now possible. In recent years, there has been an increasing interest in the reconstruction of gene networks from gene expression data. Recent work includes linear models, Boolean network models, and Bayesian networks. Among them, Bayesian networks seem to be the most effective in constructing gene networks. A major problem with the Bayesian network approach is the excessive computational time. This problem is due to the interactive feature of the method that requires large search space. Since fitting a model by using the copulas does not require iterations, elicitation of the priors, and complicated calculations of posterior distributions, the need for reference to extensive search spaces can be eliminated leading to manageable computational affords. Bayesian network approach produces a discretely expression of conditional probabilities. Discreteness of the characteristics is not required in the copula approach which involves use of uniform representation of the continuous random variables. Our method is able to overcome the limitation of Bayesian network method for gene-gene interaction, i.e. information loss due to binary transformation. Results We analyzed the gene interactions for two gene data sets (one group is eight histone genes and the other group is 19 genes which include DNA polymerases, DNA helicase, type B cyclin genes, DNA primases, radiation sensitive genes, repaire related genes, replication protein A encoding gene, DNA replication initiation factor, securin gene, nucleosome assembly factor, and a subunit of the cohesin complex by adopting a measure of directional dependence based on a copula function. We have compared

  16. [Application of ARIMA model on prediction of malaria incidence].

    Science.gov (United States)

    Jing, Xia; Hua-Xun, Zhang; Wen, Lin; Su-Jian, Pei; Ling-Cong, Sun; Xiao-Rong, Dong; Mu-Min, Cao; Dong-Ni, Wu; Shunxiang, Cai

    2016-01-29

    To predict the incidence of local malaria of Hubei Province applying the Autoregressive Integrated Moving Average model (ARIMA). SPSS 13.0 software was applied to construct the ARIMA model based on the monthly local malaria incidence in Hubei Province from 2004 to 2009. The local malaria incidence data of 2010 were used for model validation and evaluation. The model of ARIMA (1, 1, 1) (1, 1, 0) 12 was tested as relatively the best optimal with the AIC of 76.085 and SBC of 84.395. All the actual incidence data were in the range of 95% CI of predicted value of the model. The prediction effect of the model was acceptable. The ARIMA model could effectively fit and predict the incidence of local malaria of Hubei Province.

  17. Mobility Modelling through Trajectory Decomposition and Prediction

    OpenAIRE

    Faghihi, Farbod

    2017-01-01

    The ubiquity of mobile devices with positioning sensors make it possible to derive user's location at any time. However, constantly sensing the position in order to track the user's movement is not feasible, either due to the unavailability of sensors, or computational and storage burdens. In this thesis, we present and evaluate a novel approach for efficiently tracking user's movement trajectories using decomposition and prediction of trajectories. We facilitate tracking by taking advantage ...

  18. A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data

    Directory of Open Access Journals (Sweden)

    Ruzzo Walter L

    2006-03-01

    Full Text Available Abstract Background As a variety of functional genomic and proteomic techniques become available, there is an increasing need for functional analysis methodologies that integrate heterogeneous data sources. Methods In this paper, we address this issue by proposing a general framework for gene function prediction based on the k-nearest-neighbor (KNN algorithm. The choice of KNN is motivated by its simplicity, flexibility to incorporate different data types and adaptability to irregular feature spaces. A weakness of traditional KNN methods, especially when handling heterogeneous data, is that performance is subject to the often ad hoc choice of similarity metric. To address this weakness, we apply regression methods to infer a similarity metric as a weighted combination of a set of base similarity measures, which helps to locate the neighbors that are most likely to be in the same class as the target gene. We also suggest a novel voting scheme to generate confidence scores that estimate the accuracy of predictions. The method gracefully extends to multi-way classification problems. Results We apply this technique to gene function prediction according to three well-known Escherichia coli classification schemes suggested by biologists, using information derived from microarray and genome sequencing data. We demonstrate that our algorithm dramatically outperforms the naive KNN methods and is competitive with support vector machine (SVM algorithms for integrating heterogenous data. We also show that by combining different data sources, prediction accuracy can improve significantly. Conclusion Our extension of KNN with automatic feature weighting, multi-class prediction, and probabilistic inference, enhance prediction accuracy significantly while remaining efficient, intuitive and flexible. This general framework can also be applied to similar classification problems involving heterogeneous datasets.

  19. Hidden Markov Models for Human Genes

    DEFF Research Database (Denmark)

    Baldi, Pierre; Brunak, Søren; Chauvin, Yves

    1997-01-01

    We analyse the sequential structure of human genomic DNA by hidden Markov models. We apply models of widely different design: conventional left-right constructs and models with a built-in periodic architecture. The models are trained on segments of DNA sequences extracted such that they cover com...

  20. Evaluation of phenoxybenzamine in the CFA model of pain following gene expression studies and connectivity mapping.

    Science.gov (United States)

    Chang, Meiping; Smith, Sarah; Thorpe, Andrew; Barratt, Michael J; Karim, Farzana

    2010-09-16

    We have previously used the rat 4 day Complete Freund's Adjuvant (CFA) model to screen compounds with potential to reduce osteoarthritic pain. The aim of this study was to identify genes altered in this model of osteoarthritic pain and use this information to infer analgesic potential of compounds based on their own gene expression profiles using the Connectivity Map approach. Using microarrays, we identified differentially expressed genes in L4 and L5 dorsal root ganglia (DRG) from rats that had received intraplantar CFA for 4 days compared to matched, untreated control animals. Analysis of these data indicated that the two groups were distinguishable by differences in genes important in immune responses, nerve growth and regeneration. This list of differentially expressed genes defined a "CFA signature". We used the Connectivity Map approach to identify pharmacologic agents in the Broad Institute Build02 database that had gene expression signatures that were inversely related ('negatively connected') with our CFA signature. To test the predictive nature of the Connectivity Map methodology, we tested phenoxybenzamine (an alpha adrenergic receptor antagonist) - one of the most negatively connected compounds identified in this database - for analgesic activity in the CFA model. Our results indicate that at 10 mg/kg, phenoxybenzamine demonstrated analgesia comparable to that of Naproxen in this model. Evaluation of phenoxybenzamine-induced analgesia in the current study lends support to the utility of the Connectivity Map approach for identifying compounds with analgesic properties in the CFA model.

  1. Poisson Mixture Regression Models for Heart Disease Prediction

    Science.gov (United States)

    Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611

  2. Predicting birth weight with conditionally linear transformation models.

    Science.gov (United States)

    Möst, Lisa; Schmid, Matthias; Faschingbauer, Florian; Hothorn, Torsten

    2016-12-01

    Low and high birth weight (BW) are important risk factors for neonatal morbidity and mortality. Gynecologists must therefore accurately predict BW before delivery. Most prediction formulas for BW are based on prenatal ultrasound measurements carried out within one week prior to birth. Although successfully used in clinical practice, these formulas focus on point predictions of BW but do not systematically quantify uncertainty of the predictions, i.e. they result in estimates of the conditional mean of BW but do not deliver prediction intervals. To overcome this problem, we introduce conditionally linear transformation models (CLTMs) to predict BW. Instead of focusing only on the conditional mean, CLTMs model the whole conditional distribution function of BW given prenatal ultrasound parameters. Consequently, the CLTM approach delivers both point predictions of BW and fetus-specific prediction intervals. Prediction intervals constitute an easy-to-interpret measure of prediction accuracy and allow identification of fetuses subject to high prediction uncertainty. Using a data set of 8712 deliveries at the Perinatal Centre at the University Clinic Erlangen (Germany), we analyzed variants of CLTMs and compared them to standard linear regression estimation techniques used in the past and to quantile regression approaches. The best-performing CLTM variant was competitive with quantile regression and linear regression approaches in terms of conditional coverage and average length of the prediction intervals. We propose that CLTMs be used because they are able to account for possible heteroscedasticity, kurtosis, and skewness of the distribution of BWs. © The Author(s) 2014.

  3. Novel Predictive Model of the Debonding Strength for Masonry Members Retrofitted with FRP

    Directory of Open Access Journals (Sweden)

    Iman Mansouri

    2016-11-01

    Full Text Available Strengthening of masonry members using externally bonded (EB fiber-reinforced polymer (FRP composites has become a famous structural strengthening method over the past decade due to the popular advantages of FRP composites, including their high strength-to-weight ratio and excellent corrosion resistance. In this study, gene expression programming (GEP, as a novel tool, has been used to predict the debonding strength of retrofitted masonry members. The predictions of the new debonding resistance model, as well as several other models, are evaluated by comparing their estimates with experimental results of a large test database. The results indicate that the new model has the best efficiency among the models examined and represents an improvement to other models. The root mean square errors (RMSE of the best empirical Kashyap model in training and test data were, respectively, reduced by 51.7% and 41.3% using the GEP model in estimating debonding strength.

  4. Animal models of gene-environment interactions in schizophrenia.

    Science.gov (United States)

    Ayhan, Yavuz; Sawa, Akira; Ross, Christopher A; Pletnikov, Mikhail V

    2009-12-07

    The pathogenesis of schizophrenia and related mental illnesses likely involves multiple interactions between susceptibility genes of small effects and environmental factors. Gene-environment interactions occur across different stages of neurodevelopment to produce heterogeneous clinical and pathological manifestations of the disease. The main obstacle for mechanistic studies of gene-environment interplay has been the paucity of appropriate experimental systems for elucidating the molecular pathways that mediate gene-environment interactions relevant to schizophrenia. Recent advances in psychiatric genetics and a plethora of experimental data from animal studies allow us to suggest a new approach to gene-environment interactions in schizophrenia. We propose that animal models based on identified genetic mutations and measurable environment factors will help advance studies of the molecular mechanisms of gene-environment interplay.

  5. Exceptions to the rule: case studies in the prediction of pathogenicity for genetic variants in hereditary cancer genes.

    Science.gov (United States)

    Rosenthal, E T; Bowles, K R; Pruss, D; van Kan, A; Vail, P J; McElroy, H; Wenstrup, R J

    2015-12-01

    Based on current consensus guidelines and standard practice, many genetic variants detected in clinical testing are classified as disease causing based on their predicted impact on the normal expression or function of the gene in the absence of additional data. However, our laboratory has identified a subset of such variants in hereditary cancer genes for which compelling contradictory evidence emerged after the initial evaluation following the first observation of the variant. Three representative examples of variants in BRCA1, BRCA2 and MSH2 that are predicted to disrupt splicing, prematurely truncate the protein, or remove the start codon were evaluated for pathogenicity by analyzing clinical data with multiple classification algorithms. Available clinical data for all three variants contradicts the expected pathogenic classification. These variants illustrate potential pitfalls associated with standard approaches to variant classification as well as the challenges associated with monitoring data, updating classifications, and reporting potentially contradictory interpretations to the clinicians responsible for translating test outcomes to appropriate clinical action. It is important to address these challenges now as the model for clinical testing moves toward the use of large multi-gene panels and whole exome/genome analysis, which will dramatically increase the number of genetic variants identified. © 2015 The Authors. Clinical Genetics published by John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  6. Prediction of hourly solar radiation with multi-model framework

    International Nuclear Information System (INIS)

    Wu, Ji; Chan, Chee Keong

    2013-01-01

    Highlights: • A novel approach to predict solar radiation through the use of clustering paradigms. • Development of prediction models based on the intrinsic pattern observed in each cluster. • Prediction based on proper clustering and selection of model on current time provides better results than other methods. • Experiments were conducted on actual solar radiation data obtained from a weather station in Singapore. - Abstract: In this paper, a novel multi-model prediction framework for prediction of solar radiation is proposed. The framework started with the assumption that there are several patterns embedded in the solar radiation series. To extract the underlying pattern, the solar radiation series is first segmented into smaller subsequences, and the subsequences are further grouped into different clusters. For each cluster, an appropriate prediction model is trained. Hence a procedure for pattern identification is developed to identify the proper pattern that fits the current period. Based on this pattern, the corresponding prediction model is applied to obtain the prediction value. The prediction result of the proposed framework is then compared to other techniques. It is shown that the proposed framework provides superior performance as compared to others

  7. Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer

    Directory of Open Access Journals (Sweden)

    Gabere MN

    2016-06-01

    Full Text Available Musa Nur Gabere,1 Mohamed Aly Hussein,1 Mohammad Azhar Aziz2 1Department of Bioinformatics, King Abdullah International Medical Research Center/King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia; 2Colorectal Cancer Research Program, Department of Medical Genomics, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia Purpose: There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC. The selection of important features is a crucial step before training a classifier.Methods: In this study, we built a model that uses support vector machine (SVM to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid.Results: The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF, Bayes net (BN, multilayer perceptron (MLP, naïve Bayes (NB, reduced error pruning tree (REPT, and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP. Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1

  8. Characterization and detection of a widely distributed gene cluster that predicts anaerobic choline utilization by human gut bacteria.

    Science.gov (United States)

    Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A; Marks, Jonathan A; Haiser, Henry J; Turnbaugh, Peter J; Balskus, Emily P

    2015-04-14

    Elucidation of the molecular mechanisms underlying the human gut microbiota's effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. Anaerobic choline utilization is a bacterial metabolic activity that occurs in the human gut and is linked to multiple diseases. While bacterial genes responsible for

  9. Genomic prediction of complex human traits: relatedness, trait architecture and predictive meta-models

    Science.gov (United States)

    Spiliopoulou, Athina; Nagy, Reka; Bermingham, Mairead L.; Huffman, Jennifer E.; Hayward, Caroline; Vitart, Veronique; Rudan, Igor; Campbell, Harry; Wright, Alan F.; Wilson, James F.; Pong-Wong, Ricardo; Agakov, Felix; Navarro, Pau; Haley, Chris S.

    2015-01-01

    We explore the prediction of individuals' phenotypes for complex traits using genomic data. We compare several widely used prediction models, including Ridge Regression, LASSO and Elastic Nets estimated from cohort data, and polygenic risk scores constructed using published summary statistics from genome-wide association meta-analyses (GWAMA). We evaluate the interplay between relatedness, trait architecture and optimal marker density, by predicting height, body mass index (BMI) and high-density lipoprotein level (HDL) in two data cohorts, originating from Croatia and Scotland. We empirically demonstrate that dense models are better when all genetic effects are small (height and BMI) and target individuals are related to the training samples, while sparse models predict better in unrelated individuals and when some effects have moderate size (HDL). For HDL sparse models achieved good across-cohort prediction, performing similarly to the GWAMA risk score and to models trained within the same cohort, which indicates that, for predicting traits with moderately sized effects, large sample sizes and familial structure become less important, though still potentially useful. Finally, we propose a novel ensemble of whole-genome predictors with GWAMA risk scores and demonstrate that the resulting meta-model achieves higher prediction accuracy than either model on its own. We conclude that although current genomic predictors are not accurate enough for diagnostic purposes, performance can be improved without requiring access to large-scale individual-level data. Our methodologically simple meta-model is a means of performing predictive meta-analysis for optimizing genomic predictions and can be easily extended to incorporate multiple population-level summary statistics or other domain knowledge. PMID:25918167

  10. Model predictive control of a crude oil distillation column

    Directory of Open Access Journals (Sweden)

    Morten Hovd

    1999-04-01

    Full Text Available The project of designing and implementing model based predictive control on the vacuum distillation column at the Nynäshamn Refinery of Nynäs AB is described in this paper. The paper describes in detail the modeling for the model based control, covers the controller implementation, and documents the benefits gained from the model based controller.

  11. A burnout prediction model based around char morphology

    Energy Technology Data Exchange (ETDEWEB)

    T. Wu; E. Lester; M. Cloke [University of Nottingham, Nottingham (United Kingdom). Nottingham Energy and Fuel Centre

    2005-07-01

    Poor burnout in a coal-fired power plant has marked penalties in the form of reduced energy efficiency and elevated waste material that can not be utilized. The prediction of coal combustion behaviour in a furnace is of great significance in providing valuable information not only for process optimization but also for coal buyers in the international market. Coal combustion models have been developed that can make predictions about burnout behaviour and burnout potential. Most of these kinetic models require standard parameters such as volatile content, particle size and assumed char porosity in order to make a burnout prediction. This paper presents a new model called the Char Burnout Model (ChB) that also uses detailed information about char morphology in its prediction. The model can use data input from one of two sources. Both sources are derived from image analysis techniques. The first from individual analysis and characterization of real char types using an automated program. The second from predicted char types based on data collected during the automated image analysis of coal particles. Modelling results were compared with a different carbon burnout kinetic model and burnout data from re-firing the chars in a drop tube furnace operating at 1300{sup o}C, 5% oxygen across several residence times. An improved agreement between ChB model and DTF experimental data proved that the inclusion of char morphology in combustion models can improve model predictions. 27 refs., 4 figs., 4 tabs.

  12. Questioning the Faith - Models and Prediction in Stream Restoration (Invited)

    Science.gov (United States)

    Wilcock, P.

    2013-12-01

    River management and restoration demand prediction at and beyond our present ability. Management questions, framed appropriately, can motivate fundamental advances in science, although the connection between research and application is not always easy, useful, or robust. Why is that? This presentation considers the connection between models and management, a connection that requires critical and creative thought on both sides. Essential challenges for managers include clearly defining project objectives and accommodating uncertainty in any model prediction. Essential challenges for the research community include matching the appropriate model to project duration, space, funding, information, and social constraints and clearly presenting answers that are actually useful to managers. Better models do not lead to better management decisions or better designs if the predictions are not relevant to and accepted by managers. In fact, any prediction may be irrelevant if the need for prediction is not recognized. The predictive target must be developed in an active dialog between managers and modelers. This relationship, like any other, can take time to develop. For example, large segments of stream restoration practice have remained resistant to models and prediction because the foundational tenet - that channels built to a certain template will be able to transport the supplied sediment with the available flow - has no essential physical connection between cause and effect. Stream restoration practice can be steered in a predictive direction in which project objectives are defined as predictable attributes and testable hypotheses. If stream restoration design is defined in terms of the desired performance of the channel (static or dynamic, sediment surplus or deficit), then channel properties that provide these attributes can be predicted and a basis exists for testing approximations, models, and predictions.

  13. Predicting Magazine Audiences with a Loglinear Model.

    Science.gov (United States)

    1987-07-01