WorldWideScience

Sample records for gene predicts extrastriatal

  1. Loss of extra-striatal phosphodiesterase 10A expression in early premanifest Huntington's disease gene carriers.

    Science.gov (United States)

    Wilson, Heather; Niccolini, Flavia; Haider, Salman; Marques, Tiago Reis; Pagano, Gennaro; Coello, Christopher; Natesan, Sridhar; Kapur, Shitij; Rabiner, Eugenii A; Gunn, Roger N; Tabrizi, Sarah J; Politis, Marios

    2016-09-15

    Huntington's disease (HD) is a monogenic neurodegenerative disorder with an underlying pathology involving the toxic effect of mutant huntingtin protein primarily in striatal and cortical neurons. Phosphodiesterase 10A (PDE10A) regulates intracellular signalling cascades, thus having a key role in promoting neuronal survival. Using positron emission tomography (PET) with [(11)C]IMA107, we investigated the in vivo extra-striatal expression of PDE10A in 12 early premanifest HD gene carriers. Image processing and kinetic modelling was performed using MIAKAT™. Parametric images of [(11)C]IMA107 non-displaceable binding potential (BPND) were generated from the dynamic [(11)C]IMA107 scans using the simplified reference tissue model with the cerebellum as the reference tissue for nonspecific binding. We set a threshold criterion for meaningful quantification of [(11)C]IMA107 BPND at 0.30 in healthy control data; regions meeting this criterion were designated as regions of interest (ROIs). MRI-based volumetric analysis showed no atrophy in ROIs. We found significant differences in mean ROIs [(11)C]IMA107 BPND between HD gene carriers and healthy controls. HD gene carriers had significant loss of PDE10A within the insular cortex and occipital fusiform gyrus compared to healthy controls. Insula and occipital fusiform gyrus are important brain areas for the regulation of cognitive and limbic function that is impaired in HD. Our findings suggest that dysregulation of PDE10A-mediated intracellular signalling could be an early phenomenon in the course of HD with relevance also for extra-striatal brain areas.

  2. Effect of age on extrastriatal dopamine D2 receptor availability

    Energy Technology Data Exchange (ETDEWEB)

    Wang, G.J.; Volkow, N.D.; Fowler, J.S. [Brookhaven National Lab., Upton, NY (United States)]|[SUNY, Stony Brook, NY (United States)

    1996-05-01

    It is known that dopamine (DA) D2 receptor availability in basal ganglia decreases with age. This study was done to assess the effects of age on extrastriatal DA D2 receptors. DA D2 receptor availability was evaluated in 42 healthy male subjects (age mean 41 {plus_minus} 16, range 21 -86 year old) using positron emission tomography (PET) and [C-11]raclopride. DA D2 receptor availability was measured using the ratio of the distribution volume in the region of interest (caudate, putamen, thalamus, frontal, occipital cortices, temporal insula, cingulate and orbitofrontal gyri) to that in the cerebellum which is a function of B{sub max.}/K{sub d}. Pearson product-moment correlation was used to evaluate the correlation between age and D2 receptor availability. DA D2 receptor availability in putamen (r {le} 0.0001), caudate (r {le} 0.0002), thalamus (r {le} 0.03), and temporal insula (r {le} 0.01) were significantly correlated with age. The decrements in D2 receptors with age were lower in extrastriatal than in striatal regions and corresponded to a decrease of 4.7% per decade in caudate, 6.2% in putamen, 2.1% in thalamus and 2.5% in temporal insula. This study documents age related decrement of DA D2 receptor availability in striatal and extrastriatal regions.

  3. Extrastriatal dopamine D2-receptor availability in social anxiety disorder.

    Science.gov (United States)

    Plavén-Sigray, Pontus; Hedman, Erik; Victorsson, Pauliina; Matheson, Granville J; Forsberg, Anton; Djurfeldt, Diana R; Rück, Christian; Halldin, Christer; Lindefors, Nils; Cervenka, Simon

    2017-05-01

    Alterations in the dopamine system are hypothesized to influence the expression of social anxiety disorder (SAD) symptoms. However, molecular imaging studies comparing dopamine function between patients and control subjects have yielded conflicting results. Importantly, while all previous investigations focused on the striatum, findings from activation and blood flow studies indicate that prefrontal and limbic brain regions have a central role in the pathophysiology. The objective of this study was to investigate extrastriatal dopamine D2-receptor (D2-R) availability in SAD. We examined 12 SAD patients and 16 healthy controls using positron emission tomography and the high-affinity D2-R radioligand [(11)C]FLB457. Parametric images of D2-R binding potential were derived using the Logan graphical method with cerebellum as reference region. Two-tailed one-way independent ANCOVAs, with age as covariate, were used to examine differences in D2-R availability between groups using both region-based and voxel-wise analyses. The region-based analysis showed a medium effect size of higher D2-R levels in the orbitofrontal cortex (OFC) in patients, although this result did not remain significant after correction for multiple comparisons. The voxel-wise comparison revealed elevated D2-R availability in patients within OFC and right dorsolateral prefrontal cortex after correction for multiple comparisons. These preliminary results suggest that an aberrant extrastriatal dopamine system may be part of the disease mechanism in SAD. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  4. Dopamine D(2) receptor quantification in extrastriatal brain regions using [(123)I]epidepride with bolus/infusion

    DEFF Research Database (Denmark)

    Pinborg, L H; Videbaek, C; Knudsen, G M

    2000-01-01

    The iodinated benzamide epidepride, which shows a picomolar affinity binding to dopamine D(2) receptors, has been designed for in vivo studies using SPECT. The aim of the present study was to apply a steady-state condition by the bolus/infusion approach with [(123)I]epidepride for the quantificat......The iodinated benzamide epidepride, which shows a picomolar affinity binding to dopamine D(2) receptors, has been designed for in vivo studies using SPECT. The aim of the present study was to apply a steady-state condition by the bolus/infusion approach with [(123)I......]epidepride for the quantification of striatal and extrastriatal dopamine D(2) receptors in humans. In this way the distribution volume of the tracer can be determined from a single SPECT image and one blood sample. Based on bolus experiments, an algorithm using conventional convolution arguments for prediction of the outcome...... has a unique signal-to-noise ratio compared to [(123)I]IBZM but present difficulties for steady-state measurements of striatal regions. The bolus/infusion approach is particularly feasible for quantification of the binding potential in extrastriatal regions....

  5. PET neuroimaging of extrastriatal dopamine receptors and prefrontal cortex functions.

    Science.gov (United States)

    Takahashi, Hidehiko

    2013-12-01

    The role of prefrontal dopamine D1 receptors in prefrontal cortex (PFC) functions, including working memory, is widely investigated. However, human (healthy volunteers and schizophrenia patients) positron emission tomography (PET) studies about the relationship between prefrontal D1 receptors and PFC functions are somewhat inconsistent. We argued that several factors including an inverted U-shaped relationship between prefrontal D1 receptors and PFC functions might be responsible for these inconsistencies. In contrast to D1 receptors, relatively less attention has been paid to the role of D2 receptors in PFC functions. Several animal and human pharmacological studies have reported that the systemic administration of D2 receptor agonist/antagonist modulates PFC functions, although those studies do not tell us which region(s) is responsible for the effect. Furthermore, while prefrontal D1 receptors are primarily involved in working memory, other PFC functions such as set-shifting seem to be differentially modulated by dopamine. PET studies of extrastriatal D2 receptors including ours suggested that orchestration of prefrontal dopamine transmission and hippocampal dopamine transmission might be necessary for a broad range of normal PFC functions. In order to understand the complex effects of dopamine signaling on PFC functions, measuring a single index related to basic dopamine tone is not sufficient. For a better understanding of the meanings of PET indices related to neurotransmitters, comprehensive information (presynaptic, postsynaptic, and beyond receptor signaling) will be required. Still, an interdisciplinary approach combining molecular imaging techniques with cognitive neuroscience and clinical psychiatry will provide new perspectives for understanding the neurobiology of neuropsychiatric disorders and their innovative drug developments.

  6. Striatal and extrastriatal atrophy in Huntington's disease and its relationship with length of the CAG repeat

    Directory of Open Access Journals (Sweden)

    H.H. Ruocco

    2006-08-01

    Full Text Available Huntington's disease (HD is an autosomal dominant neurodegenerative disorder that affects the striatum most severely. However, except for juvenile forms, relative preservation of the cerebellum has been reported. The objective of the present study was to perform MRI measurements of caudate, putamen, cerebral, and cerebellar volumes and correlate these findings with the length of the CAG repeat and clinical parameters. We evaluated 50 consecutive patients with HD using MRI volumetric measurements and compared them to normal controls. Age at onset of the disease ranged from 4 to 73 years (mean: 43.1 years. The length of the CAG repeat ranged from 40 to 69 (mean: 47.2 CAG. HD patients presented marked atrophy of the caudate and putamen, as well as reduced cerebellar and cerebral volumes. There was a significant correlation between age at onset of HD and length of the CAG repeat, as well as clinical disability and age at onset. The degree of basal ganglia atrophy correlated with the length of the CAG repeat. There was no correlation between cerebellar or cerebral volume and length of the CAG repeat. However, there was a tendency to a positive correlation between duration of disease and cerebellar atrophy. While there was a negative correlation of length of the CAG repeat with age at disease onset and with striatal degeneration, its influence on extrastriatal atrophy, including the cerebellum, was not clear. Extrastriatal atrophy occurs later in HD and may be related to disease duration.

  7. Striatal and extrastriatal atrophy in Huntington's disease and its relationship with length of the CAG repeat.

    Science.gov (United States)

    Ruocco, H H; Lopes-Cendes, I; Li, L M; Santos-Silva, M; Cendes, F

    2006-08-01

    Huntington's disease (HD) is an autosomal dominant neurodegenerative disorder that affects the striatum most severely. However, except for juvenile forms, relative preservation of the cerebellum has been reported. The objective of the present study was to perform MRI measurements of caudate, putamen, cerebral, and cerebellar volumes and correlate these findings with the length of the CAG repeat and clinical parameters. We evaluated 50 consecutive patients with HD using MRI volumetric measurements and compared them to normal controls. Age at onset of the disease ranged from 4 to 73 years (mean: 43.1 years). The length of the CAG repeat ranged from 40 to 69 (mean: 47.2 CAG). HD patients presented marked atrophy of the caudate and putamen, as well as reduced cerebellar and cerebral volumes. There was a significant correlation between age at onset of HD and length of the CAG repeat, as well as clinical disability and age at onset. The degree of basal ganglia atrophy correlated with the length of the CAG repeat. There was no correlation between cerebellar or cerebral volume and length of the CAG repeat. However, there was a tendency to a positive correlation between duration of disease and cerebellar atrophy. While there was a negative correlation of length of the CAG repeat with age at disease onset and with striatal degeneration, its influence on extrastriatal atrophy, including the cerebellum, was not clear. Extrastriatal atrophy occurs later in HD and may be related to disease duration.

  8. Changes in extra-striatal functional connectivity in patients with schizophrenia in a psychotic episode.

    Science.gov (United States)

    Peters, Henning; Riedl, Valentin; Manoliu, Andrei; Scherr, Martin; Schwerthöffer, Dirk; Zimmer, Claus; Förstl, Hans; Bäuml, Josef; Sorg, Christian; Koch, Kathrin

    2017-01-01

    In patients with schizophrenia in a psychotic episode, intra-striatal intrinsic connectivity is increased in the putamen but not ventral striatum. Furthermore, multimodal changes have been observed in the anterior insula that interact extensively with the putamen. We hypothesised that during psychosis, putamen extra-striatal functional connectivity is altered with both the anterior insula and areas normally connected with the ventral striatum (i.e. altered functional connectivity distinctiveness of putamen and ventral striatum). We acquired resting-state functional magnetic resonance images from 21 patients with schizophrenia in a psychotic episode and 42 controls. Patients had decreased functional connectivity: the putamen with right anterior insula and dorsal prefrontal cortex, the ventral striatum with left anterior insula. Decreased functional connectivity between putamen and right anterior insula was specifically associated with patients' hallucinations. Functional connectivity distinctiveness was impaired only for the putamen. Results indicate aberrant extra-striatal connectivity during psychosis and a relationship between reduced putamen-right anterior insula connectivity and hallucinations. Data suggest that altered intrinsic connectivity links striatal and insular pathophysiology in psychosis. © The Royal College of Psychiatrists 2017.

  9. Extrastriatal binding of [¹²³I]FP-CIT in the thalamus and pons

    DEFF Research Database (Denmark)

    Koch, Walter; Unterrainer, Marcus; Xiong, Guoming

    2014-01-01

    PURPOSE: Apart from binding to the dopamine transporter (DAT), [(123)I]FP-CIT shows moderate affinity for the serotonin transporter (SERT), allowing imaging of both monoamine transporters in a single imaging session in different brain areas. The aim of this study was to systematically evaluate...... error) of 8.2 ± 1.3 % for the thalamus and 6.8 ± 2.9 % for the pons was shown. CONCLUSION: The potential to evaluate extrastriatal predominant SERT binding in addition to the striatal DAT in a single imaging session was shown using a large database of [(123)I]FP-CIT scans in healthy controls. For both...

  10. The Prediction of Rice Gene by Fgenesh

    Institute of Scientific and Technical Information of China (English)

    ZHANG Sheng-li; LI Dong-fang; ZHANG Gai-sheng; WANG Jun-wei; NIU Na

    2008-01-01

    This study has been carried out to give some scientific reasons for genome annotation, shorten the annotating time, and improve the results of gene prediction. Taking the sequence of the 6th chromosome, which has more length sequences than others, of Oryza sativa L. ssp. japonica cv. Nipponbare as analysis data in this research, the gene prediction of monocots module, rice, has been done by using Fgenesh ver. 2.0, and the predicting results have been explored particularly by bioinformatics methods. Results showed that the number of predicted genes for this chromosome was very close to the number of TIGR annotated genes. The majority of the predicted genes were multi-exon genes which had a percentage of 77.52. Length range was very big in the predicted genes. According to the significant match number, multi-exon genes can be predicted more veracity than single exon genes and the support can be reached up to 100% by TIGR annotation and up to 78% by cDNA. From the angle of predicted exons location of multi-exon genes, the internal exons and last exons had a high support of cDNA. The length of internal exons was relatively short in high (>95% length, >78% similarity) cDNA and/or TIGR annotation support multi-exon genes, but the first exons and last exons were on the reverse. The majority of single exon genes which had more than 95% in length, and 78% in similarity support by cDNA and/or TIGR annotation was relatively short in length. From the angle of exon number, the majority of the multi-exon genes of high (> 95% length, > 78% similarity) cDNA and/or TIGR annotation support had no more than 5 exon number. It was concluded that the rice gene prediction by Fgenesh was very good but needed modification manually to some extent according to cDNA support after aligning the predicting sequence of genes with cDNA database of rice.

  11. Predicting gene expression from sequence: a reexamination.

    Directory of Open Access Journals (Sweden)

    Yuan Yuan

    2007-11-01

    Full Text Available Although much of the information regarding genes' expressions is encoded in the genome, deciphering such information has been very challenging. We reexamined Beer and Tavazoie's (BT approach to predict mRNA expression patterns of 2,587 genes in Saccharomyces cerevisiae from the information in their respective promoter sequences. Instead of fitting complex Bayesian network models, we trained naïve Bayes classifiers using only the sequence-motif matching scores provided by BT. Our simple models correctly predict expression patterns for 79% of the genes, based on the same criterion and the same cross-validation (CV procedure as BT, which compares favorably to the 73% accuracy of BT. The fact that our approach did not use position and orientation information of the predicted binding sites but achieved a higher prediction accuracy, motivated us to investigate a few biological predictions made by BT. We found that some of their predictions, especially those related to motif orientations and positions, are at best circumstantial. For example, the combinatorial rules suggested by BT for the PAC and RRPE motifs are not unique to the cluster of genes from which the predictive model was inferred, and there are simpler rules that are statistically more significant than BT's ones. We also show that CV procedure used by BT to estimate their method's prediction accuracy is inappropriate and may have overestimated the prediction accuracy by about 10%.

  12. Extrastriatal dopamine D2/3 receptors and cortical grey matter volumes in antipsychotic-naïve schizophrenia patients before and after initial antipsychotic treatment

    DEFF Research Database (Denmark)

    Nørbak-Emig, Henrik; Pinborg, Lars H; Raghava, Jayachandra M

    2017-01-01

    blockade at follow-up, was related to regional cortical volume changes. In post-hoc analyses excluding three patients with cannabis use we found that higher D2/3 receptor occupancy was significantly associated with an increase in right frontal grey matter volume. CONCLUSIONS: The present data do...... not support an association between extrastriatal D2/3 receptor blockade and extrastriatal grey matter loss in the early phases of schizophrenia. Although inconclusive, our exclusion of patients tested positive for cannabis use speaks to keeping attention to potential confounding factors in imaging studies....

  13. Gene Prediction Using Multinomial Probit Regression with Bayesian Gene Selection

    Directory of Open Access Journals (Sweden)

    Xiaodong Wang

    2004-01-01

    Full Text Available A critical issue for the construction of genetic regulatory networks is the identification of network topology from data. In the context of deterministic and probabilistic Boolean networks, as well as their extension to multilevel quantization, this issue is related to the more general problem of expression prediction in which we want to find small subsets of genes to be used as predictors of target genes. Given some maximum number of predictors to be used, a full search of all possible predictor sets is combinatorially prohibitive except for small predictors sets, and even then, may require supercomputing. Hence, suboptimal approaches to finding predictor sets and network topologies are desirable. This paper considers Bayesian variable selection for prediction using a multinomial probit regression model with data augmentation to turn the multinomial problem into a sequence of smoothing problems. There are multiple regression equations and we want to select the same strongest genes for all regression equations to constitute a target predictor set or, in the context of a genetic network, the dependency set for the target. The probit regressor is approximated as a linear combination of the genes and a Gibbs sampler is employed to find the strongest genes. Numerical techniques to speed up the computation are discussed. After finding the strongest genes, we predict the target gene based on the strongest genes, with the coefficient of determination being used to measure predictor accuracy. Using malignant melanoma microarray data, we compare two predictor models, the estimated probit regressors themselves and the optimal full-logic predictor based on the selected strongest genes, and we compare these to optimal prediction without feature selection.

  14. Extrastriatal dopamine D2/3 receptors and cortical grey matter volumes in antipsychotic-naïve schizophrenia patients before and after initial antipsychotic treatment.

    Science.gov (United States)

    Nørbak-Emig, Henrik; Pinborg, Lars H; Raghava, Jayachandra M; Svarer, Claus; Baaré, William F C; Allerup, Peter; Friberg, Lars; Rostrup, Egill; Glenthøj, Birte; Ebdrup, Bjørn H

    2017-10-01

    Long-term dopamine D2/3 receptor blockade, common to all antipsychotics, may underlie progressive brain volume changes observed in patients with chronic schizophrenia. In the present study, we examined associations between cortical volume changes and extrastriatal dopamine D2/3 receptor binding potentials (BPND) in first-episode schizophrenia patents at baseline and after antipsychotic treatment. Twenty-two initially antipsychotic-naïve patients underwent magnetic resonance imaging (MRI), [(123)I]epidepride single-photon emission computerised tomography (SPECT), and psychopathology assessments before and after 3 months of treatment with either risperidone (N = 13) or zuclopenthixol (N = 9). Twenty healthy controls matched on age, gender and parental socioeconomic status underwent baseline MRI and SPECT. Neither extrastriatal D2/3 receptor BPND at baseline, nor blockade at follow-up, was related to regional cortical volume changes. In post-hoc analyses excluding three patients with cannabis use we found that higher D2/3 receptor occupancy was significantly associated with an increase in right frontal grey matter volume. The present data do not support an association between extrastriatal D2/3 receptor blockade and extrastriatal grey matter loss in the early phases of schizophrenia. Although inconclusive, our exclusion of patients tested positive for cannabis use speaks to keeping attention to potential confounding factors in imaging studies.

  15. Integrating Gene Ontology and Blast to predict gene functions

    Institute of Scientific and Technical Information of China (English)

    WANG Cheng-gang; MO Zhi-hong

    2007-01-01

    A GoBlast system was built to predict gene function by integrating Blast search and Gene Ontology (GO) annotations together. The operation system was based on Debian Linux 3.1, with Apache as the web server and Mysql database as the data storage system. FASTA files with GO annotations were taken as the sequence source for blast alignment, which were formatted by wu-formatdb program. The GoBlast system includes three Bioperl modules in Perl: a data input module, a data process module and a data output module. A GoBlast query starts with an amino acid or nucleotide sequence. It ends with an output in an html page, presenting high scoring gene products which are of a high homology to the queried sequence and listing associated GO terms beside respective gene poducts. A simple click on a GO term leads to the detailed explanation of the specific gene function. This avails gene function prediction by Blast. GoBlast can be a very useful tool for functional genome research and is available for free at http://bioq.org/goblast.

  16. Predicting metastasized seminoma using gene expression.

    Science.gov (United States)

    Ruf, Christian G; Linbecker, Michael; Port, Matthias; Riecke, Armin; Schmelz, Hans U; Wagner, Walter; Meineke, Victor; Abend, Michael

    2012-07-01

    Treatment options for testis cancer depend on the histological subtype as well as on the clinical stage. An accurate staging is essential for correct treatment. The 'golden standard' for staging purposes is CT, but occult metastasis cannot be detected with this method. Currently, parameters such as primary tumour size, vessel invasion or invasion of the rete testis are used for predicting occult metastasis. Last year the association of these parameters with metastasis could not be validated in a new independent cohort. Gene expression analysis in testis cancer allowed discrimination between the different histological subtypes (seminoma and non-seminoma) as well as testis cancer and normal testis tissue. In a two-stage study design we (i) screened the whole genome (using human whole genome microarrays) for candidate genes associated with the metastatic stage in seminoma and (ii) validated and quantified gene expression of our candidate genes (real-time quantitative polymerase chain reaction) on another independent group. Gene expression measurements of two of our candidate genes (dopamine receptor D1 [DRD1] and family with sequence similarity 71, member F2 [FAM71F2]) examined in primary testis cancers made it possible to discriminate the metastasis status in seminoma. The discriminative ability of the genes exceeded the predictive significance of currently used histological/pathological parameters. Based on gene expression analysis the present study provides suggestions for improved individual decision making either in favour of early adjuvant therapy or increased surveillance. To evaluate the usefulness of gene expression profiling for predicting metastatic status in testicular seminoma at the time of first diagnosis compared with established clinical and pathological parameters. Total RNA was isolated from testicular tumours of metastasized patients (12 patients, clinical stage IIa-III), non-metastasized patients (40, clinical stage I) and adjacent 'normal' tissue

  17. Carbon-11 epidepride: a suitable radioligand for PET investigation of striatal and extrastriatal dopamine D{sub 2} receptors

    Energy Technology Data Exchange (ETDEWEB)

    Langer, Oliver; Halldin, Christer E-mail: christer.halldin@neuro.ks.se; Dolle, Frederic; Swahn, Carl-Gunnar; Olsson, Hans; Lundkvist, Per Karlsson; Hall, Haakan; Sandell, Johan; Vaufrey, Camilla; Loc' h, Christian; Franzoise; Crouzel, Christian; Maziere, Bernard; Farde, Lars

    1999-07-01

    Epidepride {l_brace}(S)-(-)-N-([1-ethyl-2-pyrrolidinyl]methyl)-5-iodo-2,3-dimethoxybenzamide= {r_brace} binds with a picomolar affinity (K{sub i}=24 pM) to the dopamine D{sub 2} receptor. Iodine-123-labeled epidepride has been used previously to study striatal and extrastriatal dopamine D{sub 2} receptors with single photon emission computed tomography (SPECT). Our aim was to label epidepride with carbon-11 for comparative quantitative studies between positron emission tomography (PET) and SPECT. Epidepride was synthesized from its bromo-analogue FLB 457 via the corresponding trimethyl-tin derivative. In an alternative synthetic pathway, the corresponding substituted benzoic acid was reacted with the optically pure aminomethylpyrrolidine-derivative. Demethylation of epidepride gave the desmethyl-derivative, which was reacted with [{sup 11}C]methyl triflate. Total radiochemical yield was 40-50% within a total synthesis time of 30 min. The specific radioactivity at the end of synthesis was 37-111 GBq/{mu}mol (1,000-3,000 Ci/mmol). Human postmortem whole-hemisphere autoradiography demonstrated dense binding in the caudate putamen, and also in extrastriatal areas such as the thalamus and the neocortex. The binding was inhibited by unlabeled raclopride. PET studies in a cynomolgus monkey demonstrated high uptake in the striatum and in several extrastriatal regions. At 90 min after injection, uptake in the striatum, thalamus and neocortex was about 11, 4, and 2 times higher than in the cerebellum, respectively. Pretreatment experiment with unlabeled raclopride (1 mg/kg) inhibited 50-70% of [{sup 11}C]epidepride binding. The fraction of unchanged [{sup 11}C]epidepride in monkey plasma determined by a gradient high performance liquid chromatography (HPLC) method was about 30% of the total radioactivity at 30 min after injection of [{sup 11}C]epidepride. The availability of [{sup 11}C]epidepride allows the PET-verification of the data obtained from quantitation studies with

  18. Gene function prediction based on the Gene Ontology hierarchical structure.

    Science.gov (United States)

    Cheng, Liangxi; Lin, Hongfei; Hu, Yuncui; Wang, Jian; Yang, Zhihao

    2014-01-01

    The information of the Gene Ontology annotation is helpful in the explanation of life science phenomena, and can provide great support for the research of the biomedical field. The use of the Gene Ontology is gradually affecting the way people store and understand bioinformatic data. To facilitate the prediction of gene functions with the aid of text mining methods and existing resources, we transform it into a multi-label top-down classification problem and develop a method that uses the hierarchical relationships in the Gene Ontology structure to relieve the quantitative imbalance of positive and negative training samples. Meanwhile the method enhances the discriminating ability of classifiers by retaining and highlighting the key training samples. Additionally, the top-down classifier based on a tree structure takes the relationship of target classes into consideration and thus solves the incompatibility between the classification results and the Gene Ontology structure. Our experiment on the Gene Ontology annotation corpus achieves an F-value performance of 50.7% (precision: 52.7% recall: 48.9%). The experimental results demonstrate that when the size of training set is small, it can be expanded via topological propagation of associated documents between the parent and child nodes in the tree structure. The top-down classification model applies to the set of texts in an ontology structure or with a hierarchical relationship.

  19. Genomic Prediction of Gene Bank Wheat Landraces

    Directory of Open Access Journals (Sweden)

    José Crossa

    2016-07-01

    Full Text Available This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H for the highly heritable traits, days to heading (DTH, and days to maturity (DTM. Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E. Two alternative prediction strategies were studied: (1 random cross-validation of the data in 20% training (TRN and 80% testing (TST (TRN20-TST80 sets, and (2 two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm

  20. Preliminary assessment of extrastriatal dopamine d-2 receptor binding in the rodent and nonhuman primate brains using the high affinity radioligand, {sup 18}F-fallypride

    Energy Technology Data Exchange (ETDEWEB)

    Mukherjee, Jogeshwar E-mail: jogeshwar-mukherjee@ketthealth.com; Yang, Z.-Y.; Brown, Terry; Lew, Robert; Wernick, Miles; Ouyang Xiaohu; Yasillo, Nicholas; Chen, C.-T.; Mintzer, Robert; Cooper, Malcolm

    1999-07-01

    We have identified the value of {sup 18}F-fallypride {l_brace}(S)-N-[(1-allyl-2-pyrrolidinyl)methyl]-5-(3-[{sup 18}F]fluoropropyl)-2,3-dim= ethoxybenzamide{r_brace}, as a dopamine D-2 receptor radiotracer for the study of striatal and extrastriatal receptors. Fallypride exhibits high affinities for D-2 and D-3 subtypes and low affinity for D-4 ({sup 3}H-spiperone IC{sub 50}s: D-2=0.05 nM [rat striata], D-3=0.30 nM [SF9 cell lines, rat recombinant], and D-4=240 nM [CHO cell lines, human recombinant]). Biodistribution in the rat brain showed localization of {sup 18}F-fallypride in striata and extrastriatal regions such as the frontal cortex, parietal cortex, amygdala, hippocampus, thalamus, and hypothalamus. In vitro autoradiographic studies in sagittal slices of the rat brain showed localization of {sup 18}F-fallypride in striatal and several extrastriatal regions, including the medulla. Positron emission tomography (PET) experiments with {sup 18}F-fallypride in male rhesus monkeys were carried out in a PET VI scanner. In several PET experiments, apart from the specific binding seen in the striatum, specific binding of {sup 18}F-fallypride was also identified in extracellular regions (in a lower brain slice, possibly the thalamus). Specific binding in the extrastriata was, however, significantly lower compared with that observed in the striata of the monkeys (extrastriata/cerebellum = 2, striata/cerebellum = 10). Postmortem analysis of the monkey brain revealed significant {sup 18}F-fallypride binding in the striata, whereas binding was also observed in extrastriatal regions such as the thalamus, cortical areas, and brain stem.

  1. Extrastriatal binding of [{sup 123}I]FP-CIT in the thalamus and pons: gender and age dependencies assessed in a European multicentre database of healthy controls

    Energy Technology Data Exchange (ETDEWEB)

    Koch, Walter; Unterrainer, Marcus; Xiong, Guoming; Bartenstein, Peter [University of Munich, Department of Nuclear Medicine, Munich (Germany); Diemling, Markus [Hermes Medical Solutions, Stockholm (Sweden); Varrone, Andrea [Karolinska University Hospital, Karolinska Institutet, Department of Clinical Neuroscience, Centre for Psychiatry Research, Stockholm (Sweden); Dickson, John C. [UCLH NHS Foundation Trust and University College, Institute of Nuclear Medicine, London (United Kingdom); Tossici-Bolt, Livia [University Hospitals Southampton NHS Trust, Department of Medical Physics, Southampton (United Kingdom); Sera, Terez [University of Szeged, Department of Nuclear Medicine and Euromedic Szeged, Szeged (Hungary); Asenbaum, Susanne [Medical University of Vienna, Department of Neurology, Vienna (Austria); Booij, Jan [University of Amsterdam, Department of Nuclear Medicine, Academic Medical Centre, Amsterdam (Netherlands); Kapucu, Ozlem L. [Gazi University, Department of Nuclear Medicine, Faculty of Medicine, Ankara (Turkey); Kluge, Andreas [ABX-CRO, Dresden (Germany); Ziebell, Morten [Rigshospitalet and University of Copenhagen, Neurobiology Research Unit, Copenhagen (Denmark); Darcourt, Jacques [University of Nice-Sophia Antipolis, Nuclear Medicine Department, Centre Antoine Lacassagne, Nice (France); Nobili, Flavio [University of Genoa, Clinical Neurology Unit, Department of Neuroscience (DINOGMI), Genoa (Italy); Pagani, Marco [CNR, Institute of Cognitive Sciences and Technologies, Rome (Italy); Karolinska Hospital, Department of Nuclear Medicine, Stockholm (Sweden); Hesse, Swen [University of Leipzig, Department of Nuclear Medicine, Leipzig (Germany); Leipzig University Medical Centre, Molecular Neuroimaging IFB Adiposity Diseases, Leipzig (Germany); Borght, Thierry Vander [Universite Catholique de Louvain, Nuclear Medicine Division, CHU Dinant Godinne, Yvoir (Belgium); Laere, Koen van [University Hospital and K.U. Leuven, Nuclear Medicine, Leuven (Belgium); Tatsch, Klaus [Staedtisches Klinikum Karlsruhe, Department of Nuclear Medicine, Karlsruhe (Germany); La Fougere, Christian [University of Munich, Department of Nuclear Medicine, Munich (Germany); University of Tuebingen, Department of Nuclear Medicine, Tuebingen (Germany)

    2014-10-15

    Apart from binding to the dopamine transporter (DAT), [{sup 123}I]FP-CIT shows moderate affinity for the serotonin transporter (SERT), allowing imaging of both monoamine transporters in a single imaging session in different brain areas. The aim of this study was to systematically evaluate extrastriatal binding (predominantly due to SERT) and its age and gender dependencies in a large cohort of healthy controls. SPECT data from 103 healthy controls with well-defined criteria of normality acquired at 13 different imaging centres were analysed for extrastriatal binding using volumes of interest analysis for the thalamus and the pons. Data were examined for gender and age effects as well as for potential influence of striatal DAT radiotracer binding. Thalamic binding was significantly higher than pons binding. Partial correlations showed an influence of putaminal DAT binding on measured binding in the thalamus but not on the pons. Data showed high interindividual variation in extrastriatal binding. Significant gender effects with 31 % higher binding in women than in men were observed in the thalamus, but not in the pons. An age dependency with a decline per decade (±standard error) of 8.2 ± 1.3 % for the thalamus and 6.8 ± 2.9 % for the pons was shown. The potential to evaluate extrastriatal predominant SERT binding in addition to the striatal DAT in a single imaging session was shown using a large database of [{sup 123}I]FP-CIT scans in healthy controls. For both the thalamus and the pons, an age-related decline in radiotracer binding was observed. Gender effects were demonstrated for binding in the thalamus only. As a potential clinical application, the data could be used as a reference to estimate SERT occupancy in addition to nigrostriatal integrity when using [{sup 123}I]FP-CIT for DAT imaging in patients treated with selective serotonin reuptake inhibitors. (orig.)

  2. The effects of d-amphetamine on extrastriatal dopamine D{sub 2}/D{sub 3} receptors: a randomized, double-blind, placebo-controlled PET study with [{sup 11}C]FLB 457 in healthy subjects

    Energy Technology Data Exchange (ETDEWEB)

    Aalto, Sargo [University of Turku, Turku PET Centre, Turku (Finland); Aabo Akademi University, Department of Psychology, Turku (Finland); Hirvonen, Jussi; Kajander, Jaana; Naagren, Kjell; Rinne, Juha O. [University of Turku, Turku PET Centre, Turku (Finland); Kaasinen, Valtteri [University of Turku, Department of Neurology, P.O. Box 52, Turku (Finland); Hagelberg, Nora [University of Turku, Turku PET Centre, Turku (Finland); Turku University Central Hospital, Department of Anaesthesiology, Intensive Care, Emergency Care and Pain Medicine, Turku (Finland); Seppaelae, Timo [Drug Research Unit, National Public Health Institute, Helsinki (Finland); Scheinin, Harry [University of Turku, Turku PET Centre, Turku (Finland); University of Turku, Department of Pharmacology, Drug Development and Therapeutics, Turku (Finland); Hietala, Jarmo [University of Turku, Turku PET Centre, Turku (Finland); University of Turku, Department of Psychiatry, Turku (Finland)

    2009-03-15

    The dopamine D{sub 2}/D{sub 3} receptor ligand [{sup 11}C]FLB 457 and PET enable quantification of low-density extrastriatal D{sub 2}/D{sub 3} receptors, but it is uncertain whether [{sup 11}C]FLB 457 can be used for measuring extrastriatal dopamine release. We studied the effects of d-amphetamine (0.3 mg/kg i.v.) on extrastriatal [{sup 11}C]FLB 457 binding potential (BP{sub ND}) in a randomized, double-blind, placebo-controlled study including 24 healthy volunteers. The effects of d-amphetamine on [{sup 11}C]FLB 457 BP{sub ND} and distribution volume (V{sub T}) in the frontal cortex were not different from those of placebo. Small decreases in [{sup 11}C]FLB 457 BP{sub ND} were observed only in the posterior cingulate and hippocampus. The regional changes in [{sup 11}C]FLB 457 BP{sub ND} did not correlate with d-amphetamine-induced changes in subjective ratings of euphoria. This placebo-controlled study showed that d-amphetamine does not induce marked changes in measures of extrastriatal dopamine D{sub 2}/D{sub 3} receptor binding. Our results indicate that [{sup 11}C]FLB 457 PET is not a useful method for measuring extrastriatal dopamine release in humans. (orig.)

  3. Bioinformatic prediction and functional characterization of human KIAA0100 gene

    OpenAIRE

    He Cui; Xi Lan; Shemin Lu; Fujun Zhang; Wanggang Zhang

    2017-01-01

    Our previous study demonstrated that human KIAA0100 gene was a novel acute monocytic leukemia-associated antigen (MLAA) gene. But the functional characterization of human KIAA0100 gene has remained unknown to date. Here, firstly, bioinformatic prediction of human KIAA0100 gene was carried out using online softwares; Secondly, Human KIAA0100 gene expression was downregulated by the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) 9 system in U937 cells...

  4. Clinicopathologic and gene expression parameters predict liver cancer prognosis

    Directory of Open Access Journals (Sweden)

    Hao Ke

    2011-11-01

    Full Text Available Abstract Background The prognosis of hepatocellular carcinoma (HCC varies following surgical resection and the large variation remains largely unexplained. Studies have revealed the ability of clinicopathologic parameters and gene expression to predict HCC prognosis. However, there has been little systematic effort to compare the performance of these two types of predictors or combine them in a comprehensive model. Methods Tumor and adjacent non-tumor liver tissues were collected from 272 ethnic Chinese HCC patients who received curative surgery. We combined clinicopathologic parameters and gene expression data (from both tissue types in predicting HCC prognosis. Cross-validation and independent studies were employed to assess prediction. Results HCC prognosis was significantly associated with six clinicopathologic parameters, which can partition the patients into good- and poor-prognosis groups. Within each group, gene expression data further divide patients into distinct prognostic subgroups. Our predictive genes significantly overlap with previously published gene sets predictive of prognosis. Moreover, the predictive genes were enriched for genes that underwent normal-to-tumor gene network transformation. Previously documented liver eSNPs underlying the HCC predictive gene signatures were enriched for SNPs that associated with HCC prognosis, providing support that these genes are involved in key processes of tumorigenesis. Conclusion When applied individually, clinicopathologic parameters and gene expression offered similar predictive power for HCC prognosis. In contrast, a combination of the two types of data dramatically improved the power to predict HCC prognosis. Our results also provided a framework for understanding the impact of gene expression on the processes of tumorigenesis and clinical outcome.

  5. Gene expression profiling predicts the development of oral cancer.

    Science.gov (United States)

    Saintigny, Pierre; Zhang, Li; Fan, You-Hong; El-Naggar, Adel K; Papadimitrakopoulou, Vassiliki A; Feng, Lei; Lee, J Jack; Kim, Edward S; Ki Hong, Waun; Mao, Li

    2011-02-01

    Patients with oral premalignant lesion (OPL) have a high risk of developing oral cancer. Although certain risk factors, such as smoking status and histology, are known, our ability to predict oral cancer risk remains poor. The study objective was to determine the value of gene expression profiling in predicting oral cancer development. Gene expression profile was measured in 86 of 162 OPL patients who were enrolled in a clinical chemoprevention trial that used the incidence of oral cancer development as a prespecified endpoint. The median follow-up time was 6.08 years and 35 of the 86 patients developed oral cancer over the course. Gene expression profiles were associated with oral cancer-free survival and used to develop multivariate predictive models for oral cancer prediction. We developed a 29-transcript predictive model which showed marked improvement in terms of prediction accuracy (with 8% predicting error rate) over the models using previously known clinicopathologic risk factors. On the basis of the gene expression profile data, we also identified 2,182 transcripts significantly associated with oral cancer risk-associated genes (P value oral cancer risk. In multiple independent data sets, the expression profiles of the genes can differentiate head and neck cancer from normal mucosa. Our results show that gene expression profiles may improve the prediction of oral cancer risk in OPL patients and the significant genes identified may serve as potential targets for oral cancer chemoprevention. ©2011 AACR.

  6. Using intron position conservation for homology-based gene prediction.

    Science.gov (United States)

    Keilwagen, Jens; Wenk, Michael; Erickson, Jessica L; Schattat, Martin H; Grau, Jan; Hartung, Frank

    2016-05-19

    Annotation of protein-coding genes is very important in bioinformatics and biology and has a decisive influence on many downstream analyses. Homology-based gene prediction programs allow for transferring knowledge about protein-coding genes from an annotated organism to an organism of interest.Here, we present a homology-based gene prediction program called GeMoMa. GeMoMa utilizes the conservation of intron positions within genes to predict related genes in other organisms. We assess the performance of GeMoMa and compare it with state-of-the-art competitors on plant and animal genomes using an extended best reciprocal hit approach. We find that GeMoMa often makes more precise predictions than its competitors yielding a substantially increased number of correct transcripts. Subsequently, we exemplarily validate GeMoMa predictions using Sanger sequencing. Finally, we use RNA-seq data to compare the predictions of homology-based gene prediction programs, and find again that GeMoMa performs well.Hence, we conclude that exploiting intron position conservation improves homology-based gene prediction, and we make GeMoMa freely available as command-line tool and Galaxy integration. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis

    OpenAIRE

    Noar, Roslyn D.; Daub, Margaret E.

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the numb...

  8. Systematic Characterization and Prediction of Human Hypertension Genes.

    Science.gov (United States)

    Li, Yan-Hui; Zhang, Gai-Gai; Wang, Nanping

    2017-02-01

    Hypertension is a major cardiovascular risk factor and accounts for a large part of cardiovascular mortality. In this work, we analyzed the properties of hypertension genes and found that when compared with genes not yet known to be involved in hypertension regulation, known hypertension genes display distinguishing features: (1) hypertension genes tend to be located at network center; (2) hypertension genes tend to interact with each other; and (3) hypertension genes tend to enrich in certain biological processes and show certain phenotypes. Based on these features, we developed a machine-learning algorithm to predict new hypertension genes. One hundred and seventy-seven candidates were predicted with a posterior probability >0.9. Evidence supporting 17 of the predictions has been found. © 2016 American Heart Association, Inc.

  9. A Brief Review of Computational Gene Prediction Methods

    Institute of Scientific and Technical Information of China (English)

    Zhuo Wang; Yazhu Chen; Yixue Li

    2004-01-01

    With the development of genome sequencing for many organisms, more and more raw sequences need to be annotated. Gene prediction by computational methods for finding the location of protein coding regions is one of the essential issues in bioinformatics. Two classes of methods are generally adopted: similarity based searches and ab initio prediction. Here, we review the development of gene prediction methods, summarize the measures for evaluating predictor quality, highlight open problems in this area, and discuss future research directions.

  10. Building predictive gene signatures through simultaneous assessment of transcription factor activation and gene expression.

    Science.gov (United States)

    Building predictive gene signatures through simultaneous assessment of transcription factor activation and gene expression Exposure to many drugs and environmentally-relevant chemicals can cause adverse outcomes. These adverse outcomes, such as cancer, have been linked to mol...

  11. In silico network topology-based prediction of gene essentiality

    CERN Document Server

    da Silva, Joao Paulo Muller; Mombach, Jose Carlos Merino; Vieira, Renata; da Silva, Jose Guliherme Camargo; Lemke, Ney; Sinigaglia, Marialva

    2007-01-01

    The identification of genes essential for survival is important for the understanding of the minimal requirements for cellular life and for drug design. As experimental studies with the purpose of building a catalog of essential genes for a given organism are time-consuming and laborious, a computational approach which could predict gene essentiality with high accuracy would be of great value. We present here a novel computational approach, called NTPGE (Network Topology-based Prediction of Gene Essentiality), that relies on network topology features of a gene to estimate its essentiality. The first step of NTPGE is to construct the integrated molecular network for a given organism comprising protein physical, metabolic and transcriptional regulation interactions. The second step consists in training a decision tree-based machine learning algorithm on known essential and non-essential genes of the organism of interest, considering as learning attributes the network topology information for each of these genes...

  12. Gene and translation initiation site prediction in metagenomic sequences

    Energy Technology Data Exchange (ETDEWEB)

    Hyatt, Philip Douglas [ORNL; LoCascio, Philip F [ORNL; Hauser, Loren John [ORNL; Uberbacher, Edward C [ORNL

    2012-01-01

    Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translation initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.

  13. GenePRIMP: A GENE PRediction IMprovement Pipeline for Prokaryotic genomes

    Energy Technology Data Exchange (ETDEWEB)

    Pati, Amrita; Ivanova, Natalia N.; Mikhailova, Natalia; Ovchinnikova, Galina; Hooper, Sean D.; Lykidis, Athanasios; Kyrpides, Nikos C.

    2010-04-01

    We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.

  14. Embryo quality predictive models based on cumulus cells gene expression

    Directory of Open Access Journals (Sweden)

    Devjak R

    2016-06-01

    Full Text Available Since the introduction of in vitro fertilization (IVF in clinical practice of infertility treatment, the indicators for high quality embryos were investigated. Cumulus cells (CC have a specific gene expression profile according to the developmental potential of the oocyte they are surrounding, and therefore, specific gene expression could be used as a biomarker. The aim of our study was to combine more than one biomarker to observe improvement in prediction value of embryo development. In this study, 58 CC samples from 17 IVF patients were analyzed. This study was approved by the Republic of Slovenia National Medical Ethics Committee. Gene expression analysis [quantitative real time polymerase chain reaction (qPCR] for five genes, analyzed according to embryo quality level, was performed. Two prediction models were tested for embryo quality prediction: a binary logistic and a decision tree model. As the main outcome, gene expression levels for five genes were taken and the area under the curve (AUC for two prediction models were calculated. Among tested genes, AMHR2 and LIF showed significant expression difference between high quality and low quality embryos. These two genes were used for the construction of two prediction models: the binary logistic model yielded an AUC of 0.72 ± 0.08 and the decision tree model yielded an AUC of 0.73 ± 0.03. Two different prediction models yielded similar predictive power to differentiate high and low quality embryos. In terms of eventual clinical decision making, the decision tree model resulted in easy-to-interpret rules that are highly applicable in clinical practice.

  15. Predicting Gene Structures from Multiple RT-PCR Tests

    Science.gov (United States)

    Kováč, Jakub; Vinař, Tomáš; Brejová, Broňa

    It has been demonstrated that the use of additional information such as ESTs and protein homology can significantly improve accuracy of gene prediction. However, many sources of external information are still being omitted from consideration. Here, we investigate the use of product lengths from RT-PCR experiments in gene finding. We present hardness results and practical algorithms for several variants of the problem and apply our methods to a real RT-PCR data set in the Drosophila genome. We conclude that the use of RT-PCR data can improve the sensitivity of gene prediction and locate novel splicing variants.

  16. Engineering genes for predictable protein expression.

    Science.gov (United States)

    Gustafsson, Claes; Minshull, Jeremy; Govindarajan, Sridhar; Ness, Jon; Villalobos, Alan; Welch, Mark

    2012-05-01

    The DNA sequence used to encode a polypeptide can have dramatic effects on its expression. Lack of readily available tools has until recently inhibited meaningful experimental investigation of this phenomenon. Advances in synthetic biology and the application of modern engineering approaches now provide the tools for systematic analysis of the sequence variables affecting heterologous expression of recombinant proteins. We here discuss how these new tools are being applied and how they circumvent the constraints of previous approaches, highlighting some of the surprising and promising results emerging from the developing field of gene engineering.

  17. Prediction of anther-expressed gene resulation in Arabidopsis

    Institute of Scientific and Technical Information of China (English)

    HUANG JiFeng; YANG JingJin; WANG Guan; YU QingBo; YANG ZhongNan

    2008-01-01

    Anther development in Arabidopsis, a popular model plant for plant biology and genetics, is controlled by a complex gene network. Despite the extensive use of this genus for genetic research, little is known about its regulatory network. In this paper, the direct transcriptional regulatory relationships between genes expressed in Arabidopsis anther development were predicted with an integrated bioinformatic method that combines mining of microarray data with promoter analysis. A total of 7710 transcription factor-gene pairs were obtained. The 80 direct regulatory relationships demonstrating the highest con-fidence were screened from the initial 7710 pairs; three of the 80 were validated by previous experi-ments. The results indicate that our predicted results were reliable. The regulatory relationships re-vealed by this research and described in this paper may facilitate further investigation of the molecular mechanisms of anther development. The bioinformatic method used in this work can also be applied to the prediction of gene regulatory relationships in other organisms.

  18. Predictability of Genetic Interactions from Functional Gene Modules

    Directory of Open Access Journals (Sweden)

    Jonathan H. Young

    2017-02-01

    Full Text Available Characterizing genetic interactions is crucial to understanding cellular and organismal response to gene-level perturbations. Such knowledge can inform the selection of candidate disease therapy targets, yet experimentally determining whether genes interact is technically nontrivial and time-consuming. High-fidelity prediction of different classes of genetic interactions in multiple organisms would substantially alleviate this experimental burden. Under the hypothesis that functionally related genes tend to share common genetic interaction partners, we evaluate a computational approach to predict genetic interactions in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae. By leveraging knowledge of functional relationships between genes, we cross-validate predictions on known genetic interactions and observe high predictive power of multiple classes of genetic interactions in all three organisms. Additionally, our method suggests high-confidence candidate interaction pairs that can be directly experimentally tested. A web application is provided for users to query genes for predicted novel genetic interaction partners. Finally, by subsampling the known yeast genetic interaction network, we found that novel genetic interactions are predictable even when knowledge of currently known interactions is minimal.

  19. Prediction and Analysis of Retinoblastoma Related Genes through Gene Ontology and KEGG

    OpenAIRE

    Zhen Li; Bi-Qing Li; Min Jiang; Lei Chen; Jian Zhang; Lin Liu; Tao Huang

    2013-01-01

    One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB) is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance...

  20. Reranking candidate gene models with cross-species comparison for improved gene prediction

    Directory of Open Access Journals (Sweden)

    Pereira Fernando CN

    2008-10-01

    Full Text Available Abstract Background Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc. Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models.

  1. The MAOA gene predicts happiness in women.

    Science.gov (United States)

    Chen, Henian; Pine, Daniel S; Ernst, Monique; Gorodetsky, Elena; Kasen, Stephanie; Gordon, Kathy; Goldman, David; Cohen, Patricia

    2013-01-10

    Psychologists, quality of life and well-being researchers have grown increasingly interested in understanding the factors that are associated with human happiness. Although twin studies estimate that genetic factors account for 35-50% of the variance in human happiness, knowledge of specific genes is limited. However, recent advances in molecular genetics can now provide a window into neurobiological markers of human happiness. This investigation examines association between happiness and monoamine oxidase A (MAOA) genotype. Data were drawn from a longitudinal study of a population-based cohort, followed for three decades. In women, low expression of MAOA (MAOA-L) was related significantly to greater happiness (0.261 SD increase with one L-allele, 0.522 SD with two L-alleles, P=0.002) after adjusting for the potential effects of age, education, household income, marital status, employment status, mental disorder, physical health, relationship quality, religiosity, abuse history, recent negative life events and self-esteem use in linear regression models. In contrast, no such association was found in men. This new finding may help explain the gender difference on happiness and provide a link between MAOA and human happiness. Copyright © 2012 Elsevier Inc. All rights reserved.

  2. Global discriminative learning for higher-accuracy computational gene prediction.

    Directory of Open Access Journals (Sweden)

    Axel Bernal

    2007-03-01

    Full Text Available Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns.

  3. A predictive approach to identify genes differentially expressed

    Science.gov (United States)

    Saraiva, Erlandson F.; Louzada, Francisco; Milan, Luís A.; Meira, Silvana; Cobre, Juliana

    2012-10-01

    The main objective of gene expression data analysis is to identify genes that present significant changes in expression levels between a treatment and a control biological condition. In this paper, we propose a Bayesian approach to identify genes differentially expressed calculating credibility intervals from predictive densities which are constructed using sampled mean treatment effect from all genes in study excluding the treatment effect of genes previously identified with statistical evidence for difference. We compare our Bayesian approach with the standard ones based on the use of the t-test and modified t-tests via a simulation study, using small sample sizes which are common in gene expression data analysis. Results obtained indicate that the proposed approach performs better than standard ones, especially for cases with mean differences and increases in treatment variance in relation to control variance. We also apply the methodologies to a publicly available data set on Escherichia coli bacteria.

  4. Combining gene signatures improves prediction of breast cancer survival.

    Directory of Open Access Journals (Sweden)

    Xi Zhao

    Full Text Available BACKGROUND: Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123 and test set (n = 81, respectively. Gene sets from eleven previously published gene signatures are included in the study. PRINCIPAL FINDINGS: To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014. Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001. The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. CONCLUSION: Combining the predictive strength of multiple gene signatures improves

  5. Bioinformatic prediction and functional characterization of human KIAA0100 gene

    Directory of Open Access Journals (Sweden)

    He Cui

    2017-02-01

    Full Text Available Our previous study demonstrated that human KIAA0100 gene was a novel acute monocytic leukemia-associated antigen (MLAA gene. But the functional characterization of human KIAA0100 gene has remained unknown to date. Here, firstly, bioinformatic prediction of human KIAA0100 gene was carried out using online softwares; Secondly, Human KIAA0100 gene expression was downregulated by the clustered regularly interspaced short palindromic repeats (CRISPR/CRISPR-associated (Cas 9 system in U937 cells. Cell proliferation and apoptosis were next evaluated in KIAA0100-knockdown U937 cells. The bioinformatic prediction showed that human KIAA0100 gene was located on 17q11.2, and human KIAA0100 protein was located in the secretory pathway. Besides, human KIAA0100 protein contained a signalpeptide, a transmembrane region, three types of secondary structures (alpha helix, extended strand, and random coil , and four domains from mitochondrial protein 27 (FMP27. The observation on functional characterization of human KIAA0100 gene revealed that its downregulation inhibited cell proliferation, and promoted cell apoptosis in U937 cells. To summarize, these results suggest human KIAA0100 gene possibly comes within mitochondrial genome; moreover, it is a novel anti-apoptotic factor related to carcinogenesis or progression in acute monocytic leukemia, and may be a potential target for immunotherapy against acute monocytic leukemia.

  6. A network approach to predict pathogenic genes for Fusarium graminearum.

    Directory of Open Access Journals (Sweden)

    Xiaoping Liu

    Full Text Available Fusarium graminearum is the pathogenic agent of Fusarium head blight (FHB, which is a destructive disease on wheat and barley, thereby causing huge economic loss and health problems to human by contaminating foods. Identifying pathogenic genes can shed light on pathogenesis underlying the interaction between F. graminearum and its plant host. However, it is difficult to detect pathogenic genes for this destructive pathogen by time-consuming and expensive molecular biological experiments in lab. On the other hand, computational methods provide an alternative way to solve this problem. Since pathogenesis is a complicated procedure that involves complex regulations and interactions, the molecular interaction network of F. graminearum can give clues to potential pathogenic genes. Furthermore, the gene expression data of F. graminearum before and after its invasion into plant host can also provide useful information. In this paper, a novel systems biology approach is presented to predict pathogenic genes of F. graminearum based on molecular interaction network and gene expression data. With a small number of known pathogenic genes as seed genes, a subnetwork that consists of potential pathogenic genes is identified from the protein-protein interaction network (PPIN of F. graminearum, where the genes in the subnetwork are further required to be differentially expressed before and after the invasion of the pathogenic fungus. Therefore, the candidate genes in the subnetwork are expected to be involved in the same biological processes as seed genes, which imply that they are potential pathogenic genes. The prediction results show that most of the pathogenic genes of F. graminearum are enriched in two important signal transduction pathways, including G protein coupled receptor pathway and MAPK signaling pathway, which are known related to pathogenesis in other fungi. In addition, several pathogenic genes predicted by our method are verified in other

  7. A Network Approach to Predict Pathogenic Genes for Fusarium graminearum

    Science.gov (United States)

    Liu, Xiaoping; Tang, Wei-Hua; Zhao, Xing-Ming; Chen, Luonan

    2010-01-01

    Fusarium graminearum is the pathogenic agent of Fusarium head blight (FHB), which is a destructive disease on wheat and barley, thereby causing huge economic loss and health problems to human by contaminating foods. Identifying pathogenic genes can shed light on pathogenesis underlying the interaction between F. graminearum and its plant host. However, it is difficult to detect pathogenic genes for this destructive pathogen by time-consuming and expensive molecular biological experiments in lab. On the other hand, computational methods provide an alternative way to solve this problem. Since pathogenesis is a complicated procedure that involves complex regulations and interactions, the molecular interaction network of F. graminearum can give clues to potential pathogenic genes. Furthermore, the gene expression data of F. graminearum before and after its invasion into plant host can also provide useful information. In this paper, a novel systems biology approach is presented to predict pathogenic genes of F. graminearum based on molecular interaction network and gene expression data. With a small number of known pathogenic genes as seed genes, a subnetwork that consists of potential pathogenic genes is identified from the protein-protein interaction network (PPIN) of F. graminearum, where the genes in the subnetwork are further required to be differentially expressed before and after the invasion of the pathogenic fungus. Therefore, the candidate genes in the subnetwork are expected to be involved in the same biological processes as seed genes, which imply that they are potential pathogenic genes. The prediction results show that most of the pathogenic genes of F. graminearum are enriched in two important signal transduction pathways, including G protein coupled receptor pathway and MAPK signaling pathway, which are known related to pathogenesis in other fungi. In addition, several pathogenic genes predicted by our method are verified in other pathogenic fungi, which

  8. Using effective subnetworks to predict selected properties of gene networks.

    Directory of Open Access Journals (Sweden)

    Gemunu H Gunaratne

    Full Text Available BACKGROUND: Difficulties associated with implementing gene therapy are caused by the complexity of the underlying regulatory networks. The forms of interactions between the hundreds of genes, proteins, and metabolites in these networks are not known very accurately. An alternative approach is to limit consideration to genes on the network. Steady state measurements of these influence networks can be obtained from DNA microarray experiments. However, since they contain a large number of nodes, the computation of influence networks requires a prohibitively large set of microarray experiments. Furthermore, error estimates of the network make verifiable predictions impossible. METHODOLOGY/PRINCIPAL FINDINGS: Here, we propose an alternative approach. Rather than attempting to derive an accurate model of the network, we ask what questions can be addressed using lower dimensional, highly simplified models. More importantly, is it possible to use such robust features in applications? We first identify a small group of genes that can be used to affect changes in other nodes of the network. The reduced effective empirical subnetwork (EES can be computed using steady state measurements on a small number of genetically perturbed systems. We show that the EES can be used to make predictions on expression profiles of other mutants, and to compute how to implement pre-specified changes in the steady state of the underlying biological process. These assertions are verified in a synthetic influence network. We also use previously published experimental data to compute the EES associated with an oxygen deprivation network of E.coli, and use it to predict gene expression levels on a double mutant. The predictions are significantly different from the experimental results for less than of genes. CONCLUSIONS/SIGNIFICANCE: The constraints imposed by gene expression levels of mutants can be used to address a selected set of questions about a gene network.

  9. Ontology-Based Prediction and Prioritization of Gene Functional Annotations.

    Science.gov (United States)

    Chicco, Davide; Masseroli, Marco

    2016-01-01

    Genes and their protein products are essential molecular units of a living organism. The knowledge of their functions is key for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. The association of a gene or protein with its functions, described by controlled terms of biomolecular terminologies or ontologies, is named gene functional annotation. Very many and valuable gene annotations expressed through terminologies and ontologies are available. Nevertheless, they might include some erroneous information, since only a subset of annotations are reviewed by curators. Furthermore, they are incomplete by definition, given the rapidly evolving pace of biomolecular knowledge. In this scenario, computational methods that are able to quicken the annotation curation process and reliably suggest new annotations are very important. Here, we first propose a computational pipeline that uses different semantic and machine learning methods to predict novel ontology-based gene functional annotations; then, we introduce a new semantic prioritization rule to categorize the predicted annotations by their likelihood of being correct. Our tests and validations proved the effectiveness of our pipeline and prioritization of predicted annotations, by selecting as most likely manifold predicted annotations that were later confirmed.

  10. Information theory applied to the sparse gene ontology annotation network to predict novel gene function

    Science.gov (United States)

    Tao, Ying; Li, Jianrong

    2010-01-01

    Motivation Despite advances in the gene annotation process, the functions of a large portion of the gene products remain insufficiently characterized. In addition, the “in silico” prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or function genomics approaches. Results We propose a novel approach, Information Theory-based Semantic Similarity (ITSS), to automatically predict molecular functions of genes based on Gene Ontology annotations. We have demonstrated using a 10-fold cross-validation that the ITSS algorithm obtains prediction accuracies (Precision 97%, Recall 77%) comparable to other machine learning algorithms when applied to similarly dense annotated portions of the GO datasets. In addition, such method can generate highly accurate predictions in sparsely annotated portions of GO, in which previous algorithm failed to do so. As a result, our technique generates an order of magnitude more gene function predictions than previous methods. Further, this paper presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions for an evaluation than generally used cross-validations type of evaluations. By manually assessing a random sample of 100 predictions conducted in a historical roll-back evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43%–58%) can be achieved for the human GO Annotation file dated 2003. Availability The program is available on request. The 97,732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset are available at http://phenos.bsd.uchicago.edu/mphenogo/prediction_result_2005.txt. PMID:17646340

  11. Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models

    Directory of Open Access Journals (Sweden)

    Smith Terry J

    2004-03-01

    Full Text Available Abstract Background Many current gene prediction methods use only one model to represent protein-coding regions in a genome, and so are less likely to predict the location of genes that have an atypical sequence composition. It is likely that future improvements in gene finding will involve the development of methods that can adequately deal with intra-genomic compositional variation. Results This work explores a new approach to gene-prediction, based on the Self-Organizing Map, which has the ability to automatically identify multiple gene models within a genome. The current implementation, named RescueNet, uses relative synonymous codon usage as the indicator of protein-coding potential. Conclusions While its raw accuracy rate can be less than other methods, RescueNet consistently identifies some genes that other methods do not, and should therefore be of interest to gene-prediction software developers and genome annotation teams alike. RescueNet is recommended for use in conjunction with, or as a complement to, other gene prediction methods.

  12. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...... can all be predicted. Although the method relies on protein sequences as the sole input, it does not rely on sequence similarity, but instead on sequence derived protein features such as predicted post translational modifications (PTMs), protein sorting signals and physical/chemical properties...

  13. Prediction and analysis of retinoblastoma related genes through gene ontology and KEGG.

    Science.gov (United States)

    Li, Zhen; Li, Bi-Qing; Jiang, Min; Chen, Lei; Zhang, Jian; Liu, Lin; Huang, Tao

    2013-01-01

    One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB) is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.

  14. Prediction and Analysis of Retinoblastoma Related Genes through Gene Ontology and KEGG

    Directory of Open Access Journals (Sweden)

    Zhen Li

    2013-01-01

    Full Text Available One of the most important and challenging problems in biomedicine is how to predict the cancer related genes. Retinoblastoma (RB is the most common primary intraocular malignancy usually occurring in childhood. Early detection of RB could reduce the morbidity and promote the probability of disease-free survival. Therefore, it is of great importance to identify RB genes. In this study, we developed a computational method to predict RB related genes based on Dagging, with the maximum relevance minimum redundancy (mRMR method followed by incremental feature selection (IFS. 119 RB genes were compiled from two previous RB related studies, while 5,500 non-RB genes were randomly selected from Ensemble genes. Ten datasets were constructed based on all these RB and non-RB genes. Each gene was encoded with a 13,126-dimensional vector including 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. Finally, an optimal feature set including 1061 GO terms and 8 KEGG pathways was obtained. Analysis showed that these features were closely related to RB. It is anticipated that the method can be applied to predict the other cancer related genes as well.

  15. Gene-specific function prediction for non-synonymous mutations in monogenic diabetes genes.

    Directory of Open Access Journals (Sweden)

    Quan Li

    Full Text Available The rapid progress of genomic technologies has been providing new opportunities to address the need of maturity-onset diabetes of the young (MODY molecular diagnosis. However, whether a new mutation causes MODY can be questionable. A number of in silico methods have been developed to predict functional effects of rare human mutations. The purpose of this study is to compare the performance of different bioinformatics methods in the functional prediction of nonsynonymous mutations in each MODY gene, and provides reference matrices to assist the molecular diagnosis of MODY. Our study showed that the prediction scores by different methods of the diabetes mutations were highly correlated, but were more complimentary than replacement to each other. The available in silico methods for the prediction of diabetes mutations had varied performances across different genes. Applying gene-specific thresholds defined by this study may be able to increase the performance of in silico prediction of disease-causing mutations.

  16. GOPET: A tool for automated predictions of Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Glatting Karl-Heinz

    2006-03-01

    Full Text Available Abstract Background Vast progress in sequencing projects has called for annotation on a large scale. A Number of methods have been developed to address this challenging task. These methods, however, either apply to specific subsets, or their predictions are not formalised, or they do not provide precise confidence values for their predictions. Description We recently established a learning system for automated annotation, trained with a broad variety of different organisms to predict the standardised annotation terms from Gene Ontology (GO. Now, this method has been made available to the public via our web-service GOPET (Gene Ontology term Prediction and Evaluation Tool. It supplies annotation for sequences of any organism. For each predicted term an appropriate confidence value is provided. The basic method had been developed for predicting molecular function GO-terms. It is now expanded to predict biological process terms. This web service is available via http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar Conclusion Our web service gives experimental researchers as well as the bioinformatics community a valuable sequence annotation device. Additionally, GOPET also provides less significant annotation data which may serve as an extended discovery platform for the user.

  17. Predictive screening for regulators of conserved functional gene modules (gene batteries in mammals

    Directory of Open Access Journals (Sweden)

    Sigvardsson Mikael

    2005-05-01

    Full Text Available Abstract Background The expression of gene batteries, genomic units of functionally linked genes which are activated by similar sets of cis- and trans-acting regulators, has been proposed as a major determinant of cell specialization in metazoans. We developed a predictive procedure to screen the mouse and human genomes and transcriptomes for cases of gene-battery-like regulation. Results In a screen that covered ~40 per cent of all annotated protein-coding genes, we identified 21 co-expressed gene clusters with statistically supported sharing of cis-regulatory sequence elements. 66 predicted cases of over-represented transcription factor binding motifs were validated against the literature and fell into three categories: (i previously described cases of gene battery-like regulation, (ii previously unreported cases of gene battery-like regulation with some support in a limited number of genes, and (iii predicted cases that currently lack experimental support. The novel predictions include for example Sox 17 and RFX transcription factor binding sites that were detected in ~10% of all testis specific genes, and HNF-1 and 4 binding sites that were detected in ~30% of all kidney specific genes respectively. The results are publicly available at http://www.wlab.gu.se/lindahl/genebatteries. Conclusion 21 co-expressed gene clusters were enriched for a total of 66 shared cis-regulatory sequence elements. A majority of these predictions represent novel cases of potential co-regulation of functionally coupled proteins. Critical technical parameters were evaluated, and the results and the methods provide a valuable resource for future experimental design.

  18. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...

  19. Prediction of epigenetically regulated genes in breast cancer cell lines

    Energy Technology Data Exchange (ETDEWEB)

    Loss, Leandro A; Sadanandam, Anguraj; Durinck, Steffen; Nautiyal, Shivani; Flaucher, Diane; Carlton, Victoria EH; Moorhead, Martin; Lu, Yontao; Gray, Joe W; Faham, Malek; Spellman, Paul; Parvin, Bahram

    2010-05-04

    panel of breast cancer cell lines. Subnetwork enrichment of these genes has identifed 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes. Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.

  20. Prediction of epigenetically regulated genes in breast cancer cell lines

    Directory of Open Access Journals (Sweden)

    Lu Yontao

    2010-06-01

    methylation profles and gene expression in the panel of breast cancer cell lines. Subnetwork enrichment of these genes has identifed 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes. Conclusions Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.

  1. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Directory of Open Access Journals (Sweden)

    Roslyn D Noar

    Full Text Available Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that

  2. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Science.gov (United States)

    Noar, Roslyn D; Daub, Margaret E

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity) for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity) to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that they may encode

  3. Ion channel gene expression predicts survival in glioma patients.

    Science.gov (United States)

    Wang, Rong; Gurguis, Christopher I; Gu, Wanjun; Ko, Eun A; Lim, Inja; Bang, Hyoweon; Zhou, Tong; Ko, Jae-Hong

    2015-08-03

    Ion channels are important regulators in cell proliferation, migration, and apoptosis. The malfunction and/or aberrant expression of ion channels may disrupt these important biological processes and influence cancer progression. In this study, we investigate the expression pattern of ion channel genes in glioma. We designate 18 ion channel genes that are differentially expressed in high-grade glioma as a prognostic molecular signature. This ion channel gene expression based signature predicts glioma outcome in three independent validation cohorts. Interestingly, 16 of these 18 genes were down-regulated in high-grade glioma. This signature is independent of traditional clinical, molecular, and histological factors. Resampling tests indicate that the prognostic power of the signature outperforms random gene sets selected from human genome in all the validation cohorts. More importantly, this signature performs better than the random gene signatures selected from glioma-associated genes in two out of three validation datasets. This study implicates ion channels in brain cancer, thus expanding on knowledge of their roles in other cancers. Individualized profiling of ion channel gene expression serves as a superior and independent prognostic tool for glioma patients.

  4. The prediction of candidate genes for cervix related cancer through gene ontology and graph theoretical approach.

    Science.gov (United States)

    Hindumathi, V; Kranthi, T; Rao, S B; Manimaran, P

    2014-06-01

    With rapidly changing technology, prediction of candidate genes has become an indispensable task in recent years mainly in the field of biological research. The empirical methods for candidate gene prioritization that succors to explore the potential pathway between genetic determinants and complex diseases are highly cumbersome and labor intensive. In such a scenario predicting potential targets for a disease state through in silico approaches are of researcher's interest. The prodigious availability of protein interaction data coupled with gene annotation renders an ease in the accurate determination of disease specific candidate genes. In our work we have prioritized the cervix related cancer candidate genes by employing Csaba Ortutay and his co-workers approach of identifying the candidate genes through graph theoretical centrality measures and gene ontology. With the advantage of the human protein interaction data, cervical cancer gene sets and the ontological terms, we were able to predict 15 novel candidates for cervical carcinogenesis. The disease relevance of the anticipated candidate genes was corroborated through a literature survey. Also the presence of the drugs for these candidates was detected through Therapeutic Target Database (TTD) and DrugMap Central (DMC) which affirms that they may be endowed as potential drug targets for cervical cancer.

  5. Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways.

    Science.gov (United States)

    Chen, Lei; Zhang, Yu-Hang; Wang, ShaoPeng; Zhang, YunHua; Huang, Tao; Cai, Yu-Dong

    2017-01-01

    Identifying essential genes in a given organism is important for research on their fundamental roles in organism survival. Furthermore, if possible, uncovering the links between core functions or pathways with these essential genes will further help us obtain deep insight into the key roles of these genes. In this study, we investigated the essential and non-essential genes reported in a previous study and extracted gene ontology (GO) terms and biological pathways that are important for the determination of essential genes. Through the enrichment theory of GO and KEGG pathways, we encoded each essential/non-essential gene into a vector in which each component represented the relationship between the gene and one GO term or KEGG pathway. To analyze these relationships, the maximum relevance minimum redundancy (mRMR) was adopted. Then, the incremental feature selection (IFS) and support vector machine (SVM) were employed to extract important GO terms and KEGG pathways. A prediction model was built simultaneously using the extracted GO terms and KEGG pathways, which yielded nearly perfect performance, with a Matthews correlation coefficient of 0.951, for distinguishing essential and non-essential genes. To fully investigate the key factors influencing the fundamental roles of essential genes, the 21 most important GO terms and three KEGG pathways were analyzed in detail. In addition, several genes was provided in this study, which were predicted to be essential genes by our prediction model. We suggest that this study provides more functional and pathway information on the essential genes and provides a new way to investigate related problems.

  6. Evolving DNA motifs to predict GeneChip probe performance

    Directory of Open Access Journals (Sweden)

    Harrison AP

    2009-03-01

    Full Text Available Abstract Background Affymetrix High Density Oligonuclotide Arrays (HDONA simultaneously measure expression of thousands of genes using millions of probes. We use correlations between measurements for the same gene across 6685 human tissue samples from NCBI's GEO database to indicated the quality of individual HG-U133A probes. Low correlation indicates a poor probe. Results Regular expressions can be automatically created from a Backus-Naur form (BNF context-free grammar using strongly typed genetic programming. Conclusion The automatically produced motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided.

  7. Identification of rat genes by TWINSCAN gene prediction, RT-PCR, and direct sequencing

    DEFF Research Database (Denmark)

    Wu, Jia Qian; Shteynberg, David; Arumugam, Manimozhiyan

    2004-01-01

    The publication of a draft sequence of a third mammalian genome--that of the rat--suggests a need to rethink genome annotation. New mammalian sequences will not receive the kind of labor-intensive annotation efforts that are currently being devoted to human. In this paper, we demonstrate...... an alternative approach: reverse transcription-polymerase chain reaction (RT-PCR) and direct sequencing based on dual-genome de novo predictions from TWINSCAN. We tested 444 TWINSCAN-predicted rat genes that showed significant homology to known human genes implicated in disease but that were partially...

  8. Building gene expression signatures indicative of transcription factor activation to predict AOP modulation

    Science.gov (United States)

    Building gene expression signatures indicative of transcription factor activation to predict AOP modulation Adverse outcome pathways (AOPs) are a framework for predicting quantitative relationships between molecular initiatin...

  9. Dinucleotide controlled null models for comparative RNA gene prediction

    Directory of Open Access Journals (Sweden)

    Gesell Tanja

    2008-05-01

    Full Text Available Abstract Background Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. Results We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. Conclusion SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require

  10. Predicting gene ontology annotations of orphan GWAS genes using protein-protein interactions.

    Science.gov (United States)

    Kuppuswamy, Usha; Ananthasubramanian, Seshan; Wang, Yanli; Balakrishnan, Narayanaswamy; Ganapathiraju, Madhavi K

    2014-04-03

    The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown. We carried out Bayesian inference of Gene Ontology (GO) term annotations of genes by employing the directed acyclic graph structure of GO and the network of protein-protein interactions (PPIs). The approach is designed based on the fact that two proteins that interact biophysically would be in physical proximity of each other, would possess complementary molecular function, and play role in related biological processes. Predicted GO terms were ranked according to their relative association scores and the approach was evaluated quantitatively by plotting the precision versus recall values and F-scores (the harmonic mean of precision and recall) versus varying thresholds. Precisions of ~58% and ~ 40% for localization and functions respectively of proteins were determined at a threshold of ~30 (top 30 GO terms in the ranked list). Comparison with function prediction based on semantic similarity among nodes in an ontology and incorporation of those similarities in a k-nearest neighbor classifier confirmed that our results compared favorably. This approach was applied to predict the cellular component and molecular function GO terms of all human proteins that have interacting partners possessing at least one known GO annotation. The list of predictions is available at http://severus.dbmi.pitt.edu/engo/GOPRED.html. We present the algorithm, evaluations and the results of the computational predictions, especially for genes identified in

  11. Improving metabolic flux predictions using absolute gene expression data

    Directory of Open Access Journals (Sweden)

    Lee Dave

    2012-06-01

    Full Text Available Abstract Background Constraint-based analysis of genome-scale metabolic models typically relies upon maximisation of a cellular objective function such as the rate or efficiency of biomass production. Whilst this assumption may be valid in the case of microorganisms growing under certain conditions, it is likely invalid in general, and especially for multicellular organisms, where cellular objectives differ greatly both between and within cell types. Moreover, for the purposes of biotechnological applications, it is normally the flux to a specific metabolite or product that is of interest rather than the rate of production of biomass per se. Results An alternative objective function is presented, that is based upon maximising the correlation between experimentally measured absolute gene expression data and predicted internal reaction fluxes. Using quantitative transcriptomics data acquired from Saccharomyces cerevisiae cultures under two growth conditions, the method outperforms traditional approaches for predicting experimentally measured exometabolic flux that are reliant upon maximisation of the rate of biomass production. Conclusion Due to its improved prediction of experimentally measured metabolic fluxes, and of its lack of a requirement for knowledge of the biomass composition of the organism under the conditions of interest, the approach is likely to be of rather general utility. The method has been shown to predict fluxes reliably in single cellular systems. Subsequent work will investigate the method’s ability to generate condition- and tissue-specific flux predictions in multicellular organisms.

  12. Partial AUC maximization for essential gene prediction using genetic algorithms.

    Science.gov (United States)

    Hwang, Kyu-Baek; Ha, Beom-Yong; Ju, Sanghun; Kim, Sangsoo

    2013-01-01

    Identifying genes indispensable for an organism's life and their characteristics is one of the central questions in current biological research, and hence it would be helpful to develop computational approaches towards the prediction of essential genes. The performance of a predictor is usually measured by the area under the receiver operating characteristic curve (AUC). We propose a novel method by implementing genetic algorithms to maximize the partial AUC that is restricted to a specific interval of lower false positive rate (FPR), the region relevant to follow-up experimental validation. Our predictor uses various features based on sequence information, protein-protein interaction network topology, and gene expression profiles. A feature selection wrapper was developed to alleviate the over-fitting problem and to weigh each feature's relevance to prediction. We evaluated our method using the proteome of budding yeast. Our implementation of genetic algorithms maximizing the partial AUC below 0.05 or 0.10 of FPR outperformed other popular classification methods.

  13. Accurate prediction of secondary metabolite gene clusters in filamentous fungi.

    Science.gov (United States)

    Andersen, Mikael R; Nielsen, Jakob B; Klitgaard, Andreas; Petersen, Lene M; Zachariasen, Mia; Hansen, Tilde J; Blicher, Lene H; Gotfredsen, Charlotte H; Larsen, Thomas O; Nielsen, Kristian F; Mortensen, Uffe H

    2013-01-02

    Biosynthetic pathways of secondary metabolites from fungi are currently subject to an intense effort to elucidate the genetic basis for these compounds due to their large potential within pharmaceutics and synthetic biochemistry. The preferred method is methodical gene deletions to identify supporting enzymes for key synthases one cluster at a time. In this study, we design and apply a DNA expression array for Aspergillus nidulans in combination with legacy data to form a comprehensive gene expression compendium. We apply a guilt-by-association-based analysis to predict the extent of the biosynthetic clusters for the 58 synthases active in our set of experimental conditions. A comparison with legacy data shows the method to be accurate in 13 of 16 known clusters and nearly accurate for the remaining 3 clusters. Furthermore, we apply a data clustering approach, which identifies cross-chemistry between physically separate gene clusters (superclusters), and validate this both with legacy data and experimentally by prediction and verification of a supercluster consisting of the synthase AN1242 and the prenyltransferase AN11080, as well as identification of the product compound nidulanin A. We have used A. nidulans for our method development and validation due to the wealth of available biochemical data, but the method can be applied to any fungus with a sequenced and assembled genome, thus supporting further secondary metabolite pathway elucidation in the fungal kingdom.

  14. Data mining approach to predict BRCA1 gene mutation

    Directory of Open Access Journals (Sweden)

    Olegas Niakšu

    2013-09-01

    Full Text Available Breast cancer is the most frequent women cancer form and one of the leading mortality causes among women around the world. Patients with pathological mutation of a BRCA gene have 65% lifelong breast cancer probability. It is known that such patients have different cause of illness. In this study, we have proposed a new approach for the prediction of BRCA mutation carriers by methodically applying knowledge discovery steps and utilizing data mining methods. An alternative BRCA risk assessment model has been created utilizing decision tree classifier model. The biggest challenge was a very small size and imbalanced nature of the initial dataset, which have been collected by clinicians during 4 years of clinical trial. Iterative optimization of initial dataset, optimal algorithms selection and their parameterization have resulted in higher classifier model performance, with acceptable prediction accuracy for the clinical usage. In this study, three data mining problems have been analyzed using eleven data mining algorithms.

  15. Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes

    Science.gov (United States)

    Gerstung, Moritz; Pellagatti, Andrea; Malcovati, Luca; Giagounidis, Aristoteles; Porta, Matteo G Della; Jädersten, Martin; Dolatshad, Hamid; Verma, Amit; Cross, Nicholas C. P.; Vyas, Paresh; Killick, Sally; Hellström-Lindberg, Eva; Cazzola, Mario; Papaemmanuil, Elli; Campbell, Peter J.; Boultwood, Jacqueline

    2015-01-01

    Cancer is a genetic disease, but two patients rarely have identical genotypes. Similarly, patients differ in their clinicopathological parameters, but how genotypic and phenotypic heterogeneity are interconnected is not well understood. Here we build statistical models to disentangle the effect of 12 recurrently mutated genes and 4 cytogenetic alterations on gene expression, diagnostic clinical variables and outcome in 124 patients with myelodysplastic syndromes. Overall, one or more genetic lesions correlate with expression levels of ~20% of all genes, explaining 20–65% of observed expression variability. Differential expression patterns vary between mutations and reflect the underlying biology, such as aberrant polycomb repression for ASXL1 and EZH2 mutations or perturbed gene dosage for copy-number changes. In predicting survival, genomic, transcriptomic and diagnostic clinical variables all have utility, with the largest contribution from the transcriptome. Similar observations are made on the TCGA acute myeloid leukaemia cohort, confirming the general trends reported here. PMID:25574665

  16. Specific regulatory motifs predict glucocorticoid responsiveness of hippocampal gene expression.

    Science.gov (United States)

    Datson, N A; Polman, J A E; de Jonge, R T; van Boheemen, P T M; van Maanen, E M T; Welten, J; McEwen, B S; Meiland, H C; Meijer, O C

    2011-10-01

    The glucocorticoid receptor (GR) is an ubiquitously expressed ligand-activated transcription factor that mediates effects of cortisol in relation to adaptation to stress. In the brain, GR affects the hippocampus to modulate memory processes through direct binding to glucocorticoid response elements (GREs) in the DNA. However, its effects are to a high degree cell specific, and its target genes in different cell types as well as the mechanisms conferring this specificity are largely unknown. To gain insight in hippocampal GR signaling, we characterized to which GRE GR binds in the rat hippocampus. Using a position-specific scoring matrix, we identified evolutionary-conserved putative GREs from a microarray based set of hippocampal target genes. Using chromatin immunoprecipitation, we were able to confirm GR binding to 15 out of a selection of 32 predicted sites (47%). The majority of these 15 GREs are previously undescribed and thus represent novel GREs that bind GR and therefore may be functional in the rat hippocampus. GRE nucleotide composition was not predictive for binding of GR to a GRE. A search for conserved flanking sequences that may predict GR-GRE interaction resulted in the identification of GC-box associated motifs, such as Myc-associated zinc finger protein 1, within 2 kb of GREs with GR binding in the hippocampus. This enrichment was not present around nonbinding GRE sequences nor around proven GR-binding sites from a mesenchymal stem-like cell dataset that we analyzed. GC-binding transcription factors therefore may be unique partners for DNA-bound GR and may in part explain cell-specific transcriptional regulation by glucocorticoids in the context of the hippocampus.

  17. Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory

    Directory of Open Access Journals (Sweden)

    Gao Haichun

    2007-08-01

    Full Text Available Abstract Background Large-scale sequencing of entire genomes has ushered in a new age in biology. One of the next grand challenges is to dissect the cellular networks consisting of many individual functional modules. Defining co-expression networks without ambiguity based on genome-wide microarray data is difficult and current methods are not robust and consistent with different data sets. This is particularly problematic for little understood organisms since not much existing biological knowledge can be exploited for determining the threshold to differentiate true correlation from random noise. Random matrix theory (RMT, which has been widely and successfully used in physics, is a powerful approach to distinguish system-specific, non-random properties embedded in complex systems from random noise. Here, we have hypothesized that the universal predictions of RMT are also applicable to biological systems and the correlation threshold can be determined by characterizing the correlation matrix of microarray profiles using random matrix theory. Results Application of random matrix theory to microarray data of S. oneidensis, E. coli, yeast, A. thaliana, Drosophila, mouse and human indicates that there is a sharp transition of nearest neighbour spacing distribution (NNSD of correlation matrix after gradually removing certain elements insider the matrix. Testing on an in silico modular model has demonstrated that this transition can be used to determine the correlation threshold for revealing modular co-expression networks. The co-expression network derived from yeast cell cycling microarray data is supported by gene annotation. The topological properties of the resulting co-expression network agree well with the general properties of biological networks. Computational evaluations have showed that RMT approach is sensitive and robust. Furthermore, evaluation on sampled expression data of an in silico modular gene system has showed that under

  18. A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks.

    Science.gov (United States)

    Xiang, Zuoshuang; Qin, Tingting; Qin, Zhaohui S; He, Yongqun

    2013-10-16

    The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level. The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists. The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining

  19. Predicting survival outcomes using subsets of significant genes in prognostic marker studies with microarrays

    Directory of Open Access Journals (Sweden)

    Matsui Shigeyuki

    2006-03-01

    Full Text Available Abstract Background Genetic markers hold great promise for refining our ability to establish precise prognostic prediction for diseases. The development of comprehensive gene expression microarray technology has allowed the selection of relevant marker genes from a large pool of candidate genes in early-phased, developmental prognostic marker studies. The primary analytical task in such studies is to select a small fraction of relevant genes, typically from a list of significant genes, for further investigation in subsequent studies. Results We develop a methodology for predicting survival outcomes using subsets of significant genes in prognostic marker studies with microarrays. Key components in this methodology include building prediction models, assessing predictive performance of prediction models, and assessing significance of prediction results. As particular specifications, we assume Cox proportional hazard models with a compound covariate. For assessing predictive accuracy, we propose to use the cross-validated log partial likelihood. To assess significance of prediction results, we apply permutation procedures in cross-validated prediction. As an additional key component peculiar to prognostic prediction, we also consider incorporation of standard prognostic factors. The methodology is evaluated using both simulated and real data. Conclusion The developed methodology for prognostic prediction using a subset of significant genes can provide new insights based on predictive capability, possibly incorporating standard prognostic factors, in selecting a fraction of relevant genes for subsequent studies.

  20. The effects of lorazepam on extrastriatal dopamine D(2/3)-receptors-A double-blind randomized placebo-controlled PET study.

    Science.gov (United States)

    Vilkman, Harry; Kajander, Jaana; Aalto, Sargo; Vahlberg, Tero; Någren, Kjell; Allonen, Topias; Syvälahti, Erkka; Hietala, Jarmo

    2009-11-30

    Lorazepam is a widely used anxiolytic drug of the benzodiazepine class. The clinical actions of benzodiazepines are thought to be mediated via specific allosteric benzodiazepine binding sites and enhancement of GABAergic neurotransmission in the brain. However, the indirect effects of benzodiazepines on other neurotransmitter systems have not been extensively studied. Previous experimental evidence suggests that benzodiazepines inhibit striatal dopamine release by enhancing the GABAergic inhibitory effect on dopamine neurons whereas very little is known about cortical or thalamic gamma-amino-butyric (GABA)-dopamine interactions during benzodiazepine administration. We explored the effects of lorazepam (a single 2.5 mg dose) on cortical and thalamic D(2/3) receptor binding using Positron-Emission Tomography (PET) and the high-affinity D(2/3)-receptor ligand [(11)C]FLB 457 in 12 healthy male volunteers. We used a randomized, double-blind and placebo-controlled study design. Dopamine D(2)/D(3) receptor binding potential was measured with the reference tissue method in several extrastriatal D(2)-receptor areas including frontal, parietal, temporal cortices and thalamus. The main subjective effect of lorazepam was sedation. Lorazepam induced a statistically significant decrease of D(2)/D(3) receptor BP(ND) in medial temporal and dorsolateral prefrontal cortex (DLPFC) that was also confirmed by a voxel-level analysis. The sedative effect of lorazepam was associated with a decrease in D(2)/D(3) receptor BP(ND) in the DLPFC. In conclusion, lorazepam decreased [(11)C]FLB 457 binding in frontal and temporal cortex, suggesting that cortical GABA-dopamine interaction may be involved in the central actions of lorazepam in healthy volunteers. The correlation between lorazepam-induced sedation and D(2)/D(3) receptor binding potential (BP) change further supports this hypothesis.

  1. A dual-tracer study of extrastriatal 6-[18F]fluoro-m-tyrosine and 6-[18F]-Fluoro-L-dopa uptake in Parkinson's disease

    Science.gov (United States)

    Li, Clarence; Palotti, Matthew; Holden, James E.; Oh, Jen; Okonkwo, Ozioma; Christian, Bradley T.; Bendlin, Barbara B.; Buyan-Dent, Laura; Harding, Sandra J.; Stone, Charles K.; Dejesus, Onofre T.; Nickles, Robert J.; Gallagher, Catherine L

    2014-01-01

    6-[18F]-Fluoro-L-dopa (FDOPA) has been widely used as a biomarker for catecholamine synthesis, storage, and metabolism—its intense uptake in the striatum, and fainter uptake in other brain regions, is correlated with the symptoms and pathophysiology of Parkinson's disease (PD). 6-[18F]fluoro-m-tyrosine (FMT), which also targets L-amino acid decarboxylase, has potential advantages over FDOPA as a radiotracer because it does not form catechol-O-methyltransferase (COMT) metabolites. The purpose of the present study was to compare the regional distribution of these radiotracers in the brains of PD patients. 15 Parkinson's patients were studied with FMT and FDOPA positron emission tomography (PET) as well as high-resolution structural magnetic resonance imaging (MRI). MRI's were automatically parcellated into neuroanatomical regions of interest (ROIs) in Freesurfer (http://surfer.nmr.mgh.harvard.edu); region-specific uptake rate constants (Kocc) were generated from coregistered PET using a Patlak graphical approach. The essential findings were as follows: (1) regional Kocc were highly correlated between the radiotracers and in agreement with a previous FDOPA studies that used different ROI selection techniques; (2) FMT Kocc were higher in extrastriatal regions of relatively large uptake such as amygdala, pallidum, brainstem, hippocampus, entorhinal cortex, and thalamus, whereas cortical Kocc were similar between radiotracers; (3) while subcortical uptake of both radiotracers was related to disease duration and severity, cortical uptake was not. These results suggest that FMT may have advantages for examining pathologic changes within allocortical loop structures, which may contribute to cognitive and emotional symptoms of PD. PMID:24710997

  2. WeGET: predicting new genes for molecular systems by weighted co-expression

    NARCIS (Netherlands)

    Szklarczyk, R.; Megchelenbrink, W.; Cizek, P.; Ledent, M.; Velemans, G.; Szklarczyk, D.; Huynen, M.A.

    2016-01-01

    We have developed the Weighted Gene Expression Tool and database (WeGET, http://weget.cmbi.umcn.nl) for the prediction of new genes of a molecular system by correlated gene expression. WeGET utilizes a compendium of 465 human and 560 murine gene expression datasets that have been collected from

  3. Data Mining Strategy for "Gene Prediction" with Special Reference to Cotton Genome

    Institute of Scientific and Technical Information of China (English)

    KSHIRSAGAR Manali; BALASUBRAMANI G; SINGH Col Gurmit

    2008-01-01

    @@ This paper presents an integrated approach towards solving the problem of "Gene Prediction".The "Gene Prediction" problem solving undergoes well defined stages starting with a DNA sequence as input and lab treatment and computational analysis go hands in hands throughout the process.Many bioinformatics tools are available for analysis at different stages of "Gene Prediction",but a simplified and integrated approach is needed to support and speed up the task of a life scientist.

  4. Combinations of gene ontology and pathway characterize and predict prognosis genes for recurrence of gastric cancer after surgery.

    Science.gov (United States)

    Fan, Haiyan; Guo, Zhanjun; Wang, Cuijv

    2015-09-01

    Gastric cancer (GC) is the second leading cause of death from cancer globally. The most common cause of GC is the infection of Helicobacter pylori, but ∼11% of cases are caused by genetic factors. However, recurrences occur in approximately one-third of stage II GC patients, even if they are treated with adjuvant chemotherapy or chemoradiotherapy. This is potentially due to expression variation of genes; some candidate prognostic genes were identified in patients with high-risk recurrences. The objective of this study was to develop an effective computational method for meaningfully interpreting these GC-related genes and accurately predicting novel prognostic genes for high-risk recurrence patients. We employed properties of genes (gene ontology [GO] and KEGG pathway information) as features to characterize GC-related genes. We obtained an optimal set of features for interpreting these genes. By applying the minimum redundancy maximum relevance algorithm, we predicted the GC-related genes. With the same approach, we further predicted the genes for the prognostic of high-risk recurrence. We obtained 1104 GO terms and KEGG pathways and 530 GO terms and KEGG pathways, respectively, that characterized GC-related genes and recurrence-related genes well. Finally, three novel prognostic genes were predicted to help supplement genetic markers of high-risk GC patients for recurrence after surgery. An in-depth text mining indicated that the results are quite consistent with previous knowledge. Survival analysis of patients confirmed the novel prognostic genes as markers. By analyzing the related genes, we developed a systematic method to interpret the possible underlying mechanism of GC. The novel prognostic genes facilitate the understanding and therapy of GC recurrences after surgery.

  5. Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome

    Institute of Scientific and Technical Information of China (English)

    Heng Li; Tao Liu; Hai-Hong Li; Yan Li; Li-Jun Fang; Hui-Min Xie; Wei-Mou Zheng; Bai-Lin Hao; Jin-Song Liu; Zhao Xu; Jiao Jin; Lin Fang; Lei Gao; Yu-Dong Li; Zi-Xing Xing; Shao-Gen Gao

    2005-01-01

    With several rice genome projects approaching completion gene prediction/finding by computer algorithms has become an urgent task. Two test sets were constructed by mapping the newly published 28,469 full-length KOME rice cDNA to the RGP BAC clone sequences of Oryza sativa ssp. japonica: a single-gene set of 550 sequences and a multi-gene set of 62 sequences with 271 genes. These data sets were used to evaluate five ab initio gene prediction programs: RiceHMM,GlimmerR, GeneMark, FGENSH and BGF. The predictions were compared on nucleotide, exon and whole gene structure levels using commonly accepted measures and several new measures. The test results show a progress in performance in chronological order. At the same time complementarity of the programs hints on the possibility of further improvement and on the feasibility of reaching better performance by combining several gene-finders.

  6. Hybrid SPR algorithm to select predictive genes for effectual cancer classification

    OpenAIRE

    2012-01-01

    Designing an automated system for classifying DNA microarray data is an extremely challenging problem because of its high dimension and low amount of sample data. In this paper, a hybrid statistical pattern recognition algorithm is proposed to reduce the dimensionality and select the predictive genes for the classification of cancer. Colon cancer gene expression profiles having 62 samples of 2000 genes were used for the experiment. A gene subset of 6 highly informative genes was selecte...

  7. Advances and perspectives in computational prediction of microbial gene essentiality

    NARCIS (Netherlands)

    Mobegi, Fredrick M; Zomer, Aldert; de Jonge, Marien I; van Hijum, Sacha A F T

    2017-01-01

    The minimal subset of genes required for cellular growth, survival and viability of an organism are classified as essential genes. Knowledge of essential genes gives insight into the core structure and functioning of a cell. This might lead to more efficient antimicrobial drug discovery, to elucidat

  8. Advances and perspectives in computational prediction of microbial gene essentiality

    NARCIS (Netherlands)

    Mobegi, Fredrick M; Zomer, Aldert; de Jonge, Marien I; van Hijum, Sacha A F T

    The minimal subset of genes required for cellular growth, survival and viability of an organism are classified as essential genes. Knowledge of essential genes gives insight into the core structure and functioning of a cell. This might lead to more efficient antimicrobial drug discovery, to

  9. Evaluation of the utility of gene expression and metabolic information for genomic prediction in maize.

    Science.gov (United States)

    Guo, Zhigang; Magwire, Michael M; Basten, Christopher J; Xu, Zhanyou; Wang, Daolong

    2016-12-01

    Predictive ability derived from gene expression and metabolic information was evaluated using genomic prediction methods based on datasets from a public maize panel. With the rapid development of high throughput biological technologies, information from gene expression and metabolites has received growing attention in plant genetics and breeding. In this study, we evaluated the utility of gene expression and metabolic information for genomic prediction using data obtained from a maize diversity panel. Our results show that, when used as predictor variables, gene expression levels and metabolite abundances provided reasonable predictive abilities relative to those based on genetic markers, although these values were not as large as those with genetic markers. Integrating gene expression levels and metabolite abundances with genetic markers significantly improved predictive abilities in comparison to the benchmark genomic best linear unbiased prediction model using genome-wide markers only. Predictive abilities based on gene expression and metabolites were trait-specific and were affected by the time of measurement and tissue samples as well as the number of genes and metabolites included in the model. In general, our results suggest that, rather than being conventionally used as intermediate phenotypes, gene expression and metabolic information can be used as predictors for genomic prediction and help improve genetic gains for complex traits in breeding programs.

  10. Pathogenic Network Analysis Predicts Candidate Genes for Cervical Cancer

    Directory of Open Access Journals (Sweden)

    Yun-Xia Zhang

    2016-01-01

    Full Text Available Purpose. The objective of our study was to predicate candidate genes in cervical cancer (CC using a network-based strategy and to understand the pathogenic process of CC. Methods. A pathogenic network of CC was extracted based on known pathogenic genes (seed genes and differentially expressed genes (DEGs between CC and normal controls. Subsequently, cluster analysis was performed to identify the subnetworks in the pathogenic network using ClusterONE. Each gene in the pathogenic network was assigned a weight value, and then candidate genes were obtained based on the weight distribution. Eventually, pathway enrichment analysis for candidate genes was performed. Results. In this work, a total of 330 DEGs were identified between CC and normal controls. From the pathogenic network, 2 intensely connected clusters were extracted, and a total of 52 candidate genes were detected under the weight values greater than 0.10. Among these candidate genes, VIM had the highest weight value. Moreover, candidate genes MMP1, CDC45, and CAT were, respectively, enriched in pathway in cancer, cell cycle, and methane metabolism. Conclusion. Candidate pathogenic genes including MMP1, CDC45, CAT, and VIM might be involved in the pathogenesis of CC. We believe that our results can provide theoretical guidelines for future clinical application.

  11. WebAUGUSTUS--a web service for training AUGUSTUS and predicting genes in eukaryotes.

    Science.gov (United States)

    Hoff, Katharina J; Stanke, Mario

    2013-07-01

    The prediction of protein coding genes is an important step in the annotation of newly sequenced and assembled genomes. AUGUSTUS is one of the most accurate tools for eukaryotic gene prediction. Here, we present WebAUGUSTUS, a web interface for training AUGUSTUS and predicting genes with AUGUSTUS. Depending on the needs of the user, WebAUGUSTUS generates training gene structures automatically. Besides a genome file, either a file with expressed sequence tags or a file with protein sequences is required for this step. Alternatively, it is possible to submit an externally generated training gene structure file and a genome file. The web service optimizes AUGUSTUS parameters and predicts genes with those parameters. WebAUGUSTUS is available at http://bioinf.uni-greifswald.de/webaugustus.

  12. Network-based gene prediction for Plasmodium falciparum malaria towards genetics-based drug discovery.

    Science.gov (United States)

    Chen, Yang; Xu, Rong

    2015-01-01

    Malaria is the most deadly parasitic infectious disease. Existing drug treatments have limited efficacy in malaria elimination, and the complex pathogenesis of the disease is not fully understood. Detecting novel malaria-associated genes not only contributes in revealing the disease pathogenesis, but also facilitates discovering new targets for anti-malaria drugs. In this study, we developed a network-based approach to predict malaria-associated genes. We constructed a cross-species network to integrate human-human, parasite-parasite and human-parasite protein interactions. Then we extended the random walk algorithm on this network, and used known malaria genes as the seeds to find novel candidate genes for malaria. We validated our algorithms using 77 known malaria genes: 14 human genes and 63 parasite genes were ranked averagely within top 2% and top 4%, respectively among human and parasite genomes. We also evaluated our method for predicting novel malaria genes using a set of 27 genes with literature supporting evidence. Our approach ranked 12 genes within top 1% and 24 genes within top 5%. In addition, we demonstrated that top-ranked candied genes were enriched for drug targets, and identified commonalities underlying top-ranked malaria genes through pathway analysis. In summary, the candidate malaria-associated genes predicted by our data-driven approach have the potential to guide genetics-based anti-malaria drug discovery.

  13. ETS Gene Fusions as Predictive Biomarkers of Resistance to Radiation Therapy for Prostate Cancer

    Science.gov (United States)

    2015-10-01

    Award Number: W81XWH-10-1-0582 TITLE: ETS Gene Fusions as Predictive Biomarkers of Resistance to Radiation Therapy for Prostate Cancer PRINCIPAL...ETS gene fusion status associated with clinical outcomes following radiation therapy , by analyzing both the collected biomarker and clinical data...denotes absence of an ERG fusion). ETS gene fusions status did not predict outcomes following radiation therapy , as demonstrated by Kaplan Meier

  14. Comparison of gene sets for expression profiling: prediction of metastasis from low-malignant breast cancer

    DEFF Research Database (Denmark)

    Thomassen, Mads; Tan, Qihua; Eiriksdottir, Freyja;

    2007-01-01

    -six tumors from low-risk patients and 34 low-malignant T2 tumors from patients with slightly higher risk have been examined by genome-wide gene expression analysis. Nine prognostic gene sets were tested in this data set. RESULTS: A 32-gene profile (HUMAC32) that accurately predicts metastasis has previously...... sets, mainly developed in high-risk cancers, predict metastasis from low-malignant cancer....

  15. Prediction of highly expressed genes in microbes based on chromatin accessibility

    DEFF Research Database (Denmark)

    Willenbrock, Hanni; Ussery, David

    2007-01-01

    BACKGROUND: It is well known that gene expression is dependent on chromatin structure in eukaryotes and it is likely that chromatin can play a role in bacterial gene expression as well. Here, we use a nucleosomal position preference measure of anisotropic DNA flexibility to predict highly expressed...... and ribosomal RNA are encoded by DNA having significantly lower position preference values than other genes in fast-replicating microbes. CONCLUSION: This insight into DNA structure-dependent gene expression in microbes may be exploited for predicting the expression of non-translated genes such as non...

  16. Prediction of drug-drug interactions from chemogenomic and gene-gene interactions and analysis of drug-drug interactions

    OpenAIRE

    2013-01-01

    The interactions between multiple drugs administered to an organism concurrently, whether in the form of synergy or antagonism, are of clinical relevance. Moreover, un-derstanding the mechanisms and nature of drug-drug interactions is of great practical and theoretical interest. Work has previously been done on gene-gene and gene-drug interactions, but the prediction and rationalization of drug-drug interactions from this data is not straightforward. We present a strategy for attacking this p...

  17. antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification

    DEFF Research Database (Denmark)

    Blin, Kai; Wolf, Thomas; Chevrette, Marc G.

    2017-01-01

    Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding......, including prediction of gene cluster boundaries using the ClusterFinder method or the newly integrated CASSIS algorithm, improved substrate specificity prediction for non-ribosomal peptide synthetase adenylation domains based on the new SANDPUMA algorithm, improved predictions for terpene and ribosomally...

  18. Prediction of Tumor Outcome Based on Gene Expression Data

    Institute of Scientific and Technical Information of China (English)

    Liu Juan; Hitoshi Iba

    2004-01-01

    Gene expression microarray data can be used to classify tumor types. We proposed a new procedure to classify human tumor samples based on microarray gene expressions by using a hybrid supervised learning method called MOEA+WV (Multi-Objective Evolutionary Algorithm+Weighted Voting). MOEA is used to search for a relatively few subsets of informative genes from the high-dimensional gene space, and WV is used as a classification tool. This new method has been applied to predicate the subtypes of lymphoma and outcomes of medulloblastoma. The results are relatively accurate and meaningful compared to those from other methods.

  19. Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes

    NARCIS (Netherlands)

    Kourmpetis, Y.A.I.; Dijk, van A.D.J.; Braak, ter C.J.F.

    2013-01-01

    Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to

  20. Genome-wide prediction and analysis of human tissue-selective genes using microarray expression data

    OpenAIRE

    Teng Shaolei; Yang Jack Y; Wang Liangjiang

    2013-01-01

    Abstract Background Understanding how genes are expressed specifically in particular tissues is a fundamental question in developmental biology. Many tissue-specific genes are involved in the pathogenesis of complex human diseases. However, experimental identification of tissue-specific genes is time consuming and difficult. The accurate predictions of tissue-specific gene targets could provide useful information for biomarker development and drug target identification. Results In this study,...

  1. Large-scale prokaryotic gene prediction and comparison to genome annotation

    DEFF Research Database (Denmark)

    Nielsen, Pernille; Krogh, Anders Stærmose

    2005-01-01

    Motivation: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome...... genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms......-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation....

  2. Improve Survival Prediction Using Principal Components of Gene Expression Data

    Institute of Scientific and Technical Information of China (English)

    Yi-Jing Shen; Shu-Guang Huang

    2006-01-01

    The purpose of many microarray studies is to find the association between gene expression and sample characteristics such as treatment type or sample phenotype.There has been a surge of efforts developing different methods for delineating the association. Aside from the high dimensionality of microarray data, one well recognized challenge is the fact that genes could be complicatedly inter-related, thus making many statistical methods inappropriate to use directly on the expression data. Multivariate methods such as principal component analysis (PCA) and clustering are often used as a part of the effort to capture the gene correlation, and the derived components or clusters are used to describe the association between gene expression and sample phenotype. We propose a method for patient population dichotomization using maximally selected test statistics in combination with the PCA method, which shows favorable results. The proposed method is compared with a currently well-recognized method.

  3. Predictive value of MSH2 gene expression in colorectal cancer treated with capecitabine

    DEFF Research Database (Denmark)

    Jensen, Lars H; Danenberg, Kathleen D; Danenberg, Peter V;

    2007-01-01

    was associated with a hazard ratio of 0.5 (95% confidence interval, 0.23-1.11; P = 0.083) in survival analysis. CONCLUSION: The higher gene expression of MSH2 in responders and the trend for predicting overall survival indicates a predictive value of this marker in the treatment of advanced CRC with capecitabine.......PURPOSE: The objective of the present study was to evaluate the gene expression of the DNA mismatch repair gene MSH2 as a predictive marker in advanced colorectal cancer (CRC) treated with first-line capecitabine. PATIENTS AND METHODS: Microdissection of paraffin-embedded tumor tissue, RNA...

  4. Genome-wide prediction and analysis of human tissue-selective genes using microarray expression data.

    Science.gov (United States)

    Teng, Shaolei; Yang, Jack Y; Wang, Liangjiang

    2013-01-01

    Understanding how genes are expressed specifically in particular tissues is a fundamental question in developmental biology. Many tissue-specific genes are involved in the pathogenesis of complex human diseases. However, experimental identification of tissue-specific genes is time consuming and difficult. The accurate predictions of tissue-specific gene targets could provide useful information for biomarker development and drug target identification. In this study, we have developed a machine learning approach for predicting the human tissue-specific genes using microarray expression data. The lists of known tissue-specific genes for different tissues were collected from UniProt database, and the expression data retrieved from the previously compiled dataset according to the lists were used for input vector encoding. Random Forests (RFs) and Support Vector Machines (SVMs) were used to construct accurate classifiers. The RF classifiers were found to outperform SVM models for tissue-specific gene prediction. The results suggest that the candidate genes for brain or liver specific expression can provide valuable information for further experimental studies. Our approach was also applied for identifying tissue-selective gene targets for different types of tissues. A machine learning approach has been developed for accurately identifying the candidate genes for tissue specific/selective expression. The approach provides an efficient way to select some interesting genes for developing new biomedical markers and improve our knowledge of tissue-specific expression.

  5. Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile

    Institute of Scientific and Technical Information of China (English)

    GAO Lei; LI Xia; GUO Zheng; ZHU MingZhu; LI YanHui; RAO ShaoQi

    2007-01-01

    GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interaction data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automatically selects the most appropriate functional classes as specific as possible during the learning process, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to "biology process" by three measures particularly designed for functional classes organized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.

  6. Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.

  7. Prediction of drought-resistant genes in Arabidopsis thaliana using SVM-RFE.

    Directory of Open Access Journals (Sweden)

    Yanchun Liang

    Full Text Available BACKGROUND: Identifying genes with essential roles in resisting environmental stress rates high in agronomic importance. Although massive DNA microarray gene expression data have been generated for plants, current computational approaches underutilize these data for studying genotype-trait relationships. Some advanced gene identification methods have been explored for human diseases, but typically these methods have not been converted into publicly available software tools and cannot be applied to plants for identifying genes with agronomic traits. METHODOLOGY: In this study, we used 22 sets of Arabidopsis thaliana gene expression data from GEO to predict the key genes involved in water tolerance. We applied an SVM-RFE (Support Vector Machine-Recursive Feature Elimination feature selection method for the prediction. To address small sample sizes, we developed a modified approach for SVM-RFE by using bootstrapping and leave-one-out cross-validation. We also expanded our study to predict genes involved in water susceptibility. CONCLUSIONS: We analyzed the top 10 genes predicted to be involved in water tolerance. Seven of them are connected to known biological processes in drought resistance. We also analyzed the top 100 genes in terms of their biological functions. Our study shows that the SVM-RFE method is a highly promising method in analyzing plant microarray data for studying genotype-phenotype relationships. The software is freely available with source code at http://ccst.jlu.edu.cn/JCSB/RFET/.

  8. Computational prediction of microRNA genes in silkworm genome

    Institute of Scientific and Technical Information of China (English)

    TONG Chuan-zhou; JIN Yong-feng; ZHANG Yao-zhou

    2006-01-01

    MicroRNAs (miRNAs) constitute a novel, extensive class of small RNAs (~21 nucleotides), and play important gene-regulation roles during growth and development in various organisms. Here we conducted a homology search to identify homologs of previously validated miRNAs from silkworm genome. We identified 24 potential miRNA genes, and gave each of them a name according to the common criteria. Interestingly, we found that a great number of newly identified miRNAs were conserved in silkworm and Drosophila, and family alignment revealed that miRNA families might possess single nucleotide polymorphisms. miRNA gene clusters and possible functions of complement miRNA pairs are discussed.

  9. A brain region-specific predictive gene map for autism derived by profiling a reference gene set.

    Directory of Open Access Journals (Sweden)

    Ajay Kumar

    Full Text Available Molecular underpinnings of complex psychiatric disorders such as autism spectrum disorders (ASD remain largely unresolved. Increasingly, structural variations in discrete chromosomal loci are implicated in ASD, expanding the search space for its disease etiology. We exploited the high genetic heterogeneity of ASD to derive a predictive map of candidate genes by an integrated bioinformatics approach. Using a reference set of 84 Rare and Syndromic candidate ASD genes (AutRef84, we built a composite reference profile based on both functional and expression analyses. First, we created a functional profile of AutRef84 by performing Gene Ontology (GO enrichment analysis which encompassed three main areas: 1 neurogenesis/projection, 2 cell adhesion, and 3 ion channel activity. Second, we constructed an expression profile of AutRef84 by conducting DAVID analysis which found enrichment in brain regions critical for sensory information processing (olfactory bulb, occipital lobe, executive function (prefrontal cortex, and hormone secretion (pituitary. Disease specificity of this dual AutRef84 profile was demonstrated by comparative analysis with control, diabetes, and non-specific gene sets. We then screened the human genome with the dual AutRef84 profile to derive a set of 460 potential ASD candidate genes. Importantly, the power of our predictive gene map was demonstrated by capturing 18 existing ASD-associated genes which were not part of the AutRef84 input dataset. The remaining 442 genes are entirely novel putative ASD risk genes. Together, we used a composite ASD reference profile to generate a predictive map of novel ASD candidate genes which should be prioritized for future research.

  10. Gentrepid V2.0: A web server for candidate disease gene prediction

    NARCIS (Netherlands)

    Ballouz, S.; Liu, J.Y.; George, R.A.; Bains, N.; Liu, A.; Oti, M.O.; Gaeta, B.; Fatkin, D.; Wouters, M.A.

    2013-01-01

    BACKGROUND: Candidate disease gene prediction is a rapidly developing area of bioinformatics research with the potential to deliver great benefits to human health. As experimental studies detecting associations between genetic intervals and disease proliferate, better bioinformatic techniques that c

  11. PPARgene: A Database of Experimentally Verified and Computationally Predicted PPAR Target Genes.

    Science.gov (United States)

    Fang, Li; Zhang, Man; Li, Yanhui; Liu, Yan; Cui, Qinghua; Wang, Nanping

    2016-01-01

    The peroxisome proliferator-activated receptors (PPARs) are ligand-activated transcription factors of the nuclear receptor superfamily. Upon ligand binding, PPARs activate target gene transcription and regulate a variety of important physiological processes such as lipid metabolism, inflammation, and wound healing. Here, we describe the first database of PPAR target genes, PPARgene. Among the 225 experimentally verified PPAR target genes, 83 are for PPARα, 83 are for PPARβ/δ, and 104 are for PPARγ. Detailed information including tissue types, species, and reference PubMed IDs was also provided. In addition, we developed a machine learning method to predict novel PPAR target genes by integrating in silico PPAR-responsive element (PPRE) analysis with high throughput gene expression data. Fivefold cross validation showed that the performance of this prediction method was significantly improved compared to the in silico PPRE analysis method. The prediction tool is also implemented in the PPARgene database.

  12. Neural network predicts sequence of TP53 gene based on DNA chip

    DEFF Research Database (Denmark)

    Spicker, J.S.; Wikman, F.; Lu, M.L.;

    2002-01-01

    We have trained an artificial neural network to predict the sequence of the human TP53 tumor suppressor gene based on a p53 GeneChip. The trained neural network uses as input the fluorescence intensities of DNA hybridized to oligonucleotides on the surface of the chip and makes between zero...... and four errors in the predicted 1300 bp sequence when tested on wild-type TP53 sequence....

  13. Algorithm for Finding Optimal Gene Sets in Microarray Prediction

    CERN Document Server

    Deutsch, J M

    2001-01-01

    Motivation: Microarray data has been recently been shown to be efficacious in distinguishing closely related cell types that often appear in the diagnosis of cancer. It is useful to determine the minimum number of genes needed to do such a diagnosis both for clinical use and to determine the importance of specific genes for cancer. Here a replication algorithm is used for this purpose. It evolves an ensemble of predictors, all using different combinations of genes to generate a set of optimal predictors. Results: We apply this method to the leukemia data of the Whitehead/MIT group that attempts to differentially diagnose two kinds of leukemia, and also to data of Khan et. al. to distinguish four different kinds of childhood cancers. In the latter case we were able to reduce the number of genes needed from 96 down to 15, while at the same time being able to perfectly classify all of their test data. Availability: http://stravinsky.ucsc.edu/josh/gesses/ Contact: josh@physics.ucsc.edu

  14. Analysis and prediction of gene splice sites in four Aspergillus genomes

    DEFF Research Database (Denmark)

    Wang, Kai; Ussery, David; Brunak, Søren

    2009-01-01

    , splice site prediction program called NetAspGene, for the genus Aspergillus. Gene sequences from Aspergillus fumigatus, the most common mould pathogen, were used to build and test our model. Compared to many animals and plants, Aspergillus contains smaller introns; thus we have applied a larger window...

  15. A signature inferred from Drosophila mitotic genes predicts survival of breast cancer patients.

    Directory of Open Access Journals (Sweden)

    Christian Damasco

    Full Text Available INTRODUCTION: The classification of breast cancer patients into risk groups provides a powerful tool for the identification of patients who will benefit from aggressive systemic therapy. The analysis of microarray data has generated several gene expression signatures that improve diagnosis and allow risk assessment. There is also evidence that cell proliferation-related genes have a high predictive power within these signatures. METHODS: We thus constructed a gene expression signature (the DM signature using the human orthologues of 108 Drosophila melanogaster genes required for either the maintenance of chromosome integrity (36 genes or mitotic division (72 genes. RESULTS: The DM signature has minimal overlap with the extant signatures and is highly predictive of survival in 5 large breast cancer datasets. In addition, we show that the DM signature outperforms many widely used breast cancer signatures in predictive power, and performs comparably to other proliferation-based signatures. For most genes of the DM signature, an increased expression is negatively correlated with patient survival. The genes that provide the highest contribution to the predictive power of the DM signature are those involved in cytokinesis. CONCLUSION: This finding highlights cytokinesis as an important marker in breast cancer prognosis and as a possible target for antimitotic therapies.

  16. Prediction of molecular subtypes in acute myeloid leukemia based on gene expression profiling

    NARCIS (Netherlands)

    R.G.W. Verhaak (Roel); B.J. Wouters (Bas); C.A.J. Erpelinck (Claudia); S. Abbas (Saman); H.B. Beverloo (Berna); S. Lugthart (Sanne); B. Löwenberg (Bob); H.R. Delwel (Ruud); P.J.M. Valk (Peter)

    2009-01-01

    textabstractWe examined the gene expression profiles of two independent cohorts of patients with acute myeloid leukemia [n=247 and n=214 (younger than or equal to 60 years)] to study the applicability of gene expression profiling as a single assay in prediction of acute myeloid leukemia-specific mol

  17. Computational prediction of essential genes in an unculturable endosymbiotic bacterium, Wolbachia of Brugia malayi

    Directory of Open Access Journals (Sweden)

    Carlow Clotilde KS

    2009-11-01

    Full Text Available Abstract Background Wolbachia (wBm is an obligate endosymbiotic bacterium of Brugia malayi, a parasitic filarial nematode of humans and one of the causative agents of lymphatic filariasis. There is a pressing need for new drugs against filarial parasites, such as B. malayi. As wBm is required for B. malayi development and fertility, targeting wBm is a promising approach. However, the lifecycle of neither B. malayi nor wBm can be maintained in vitro. To facilitate selection of potential drug targets we computationally ranked the wBm genome based on confidence that a particular gene is essential for the survival of the bacterium. Results wBm protein sequences were aligned using BLAST to the Database of Essential Genes (DEG version 5.2, a collection of 5,260 experimentally identified essential genes in 15 bacterial strains. A confidence score, the Multiple Hit Score (MHS, was developed to predict each wBm gene's essentiality based on the top alignments to essential genes in each bacterial strain. This method was validated using a jackknife methodology to test the ability to recover known essential genes in a control genome. A second estimation of essentiality, the Gene Conservation Score (GCS, was calculated on the basis of phyletic conservation of genes across Wolbachia's parent order Rickettsiales. Clusters of orthologous genes were predicted within the 27 currently available complete genomes. Druggability of wBm proteins was predicted by alignment to a database of protein targets of known compounds. Conclusion Ranking wBm genes by either MHS or GCS predicts and prioritizes potentially essential genes. Comparison of the MHS to GCS produces quadrants representing four types of predictions: those with high confidence of essentiality by both methods (245 genes, those highly conserved across Rickettsiales (299 genes, those similar to distant essential genes (8 genes, and those with low confidence of essentiality (253 genes. These data facilitate

  18. Prediction of molecular subtypes in acute myeloid leukemia based on gene expression profiling.

    Science.gov (United States)

    Verhaak, Roel G W; Wouters, Bas J; Erpelinck, Claudia A J; Abbas, Saman; Beverloo, H Berna; Lugthart, Sanne; Löwenberg, Bob; Delwel, Ruud; Valk, Peter J M

    2009-01-01

    We examined the gene expression profiles of two independent cohorts of patients with acute myeloid leukemia [n=247 and n=214 (younger than or equal to 60 years)] to study the applicability of gene expression profiling as a single assay in prediction of acute myeloid leukemia-specific molecular subtypes. The favorable cytogenetic acute myeloid leukemia subtypes, i.e., acute myeloid leukemia with t(8;21), t(15;17) or inv(16), were predicted with maximum accuracy (positive and negative predictive value: 100%). Mutations in NPM1 and CEBPA were predicted less accurately (positive predictive value: 66% and 100%, and negative predictive value: 99% and 97% respectively). Various other characteristic molecular acute myeloid leukemia subtypes, i.e., mutant FLT3 and RAS, abnormalities involving 11q23, -5/5q-, -7/7q-, abnormalities involving 3q (abn3q) and t(9;22), could not be correctly predicted using gene expression profiling. In conclusion, gene expression profiling allows accurate prediction of certain acute myeloid leukemia subtypes, e.g. those characterized by expression of chimeric transcription factors. However, detection of mutations affecting signaling molecules and numerical abnormalities still requires alternative molecular methods.

  19. Identifying Gene Regulatory Networks in Arabidopsis by In Silico Prediction, Yeast-1-Hybrid, and Inducible Gene Profiling Assays.

    Science.gov (United States)

    Sparks, Erin E; Benfey, Philip N

    2016-01-01

    A system-wide understanding of gene regulation will provide deep insights into plant development and physiology. In this chapter we describe a threefold approach to identify the gene regulatory networks in Arabidopsis thaliana that function in a specific tissue or biological process. Since no single method is sufficient to establish comprehensive and high-confidence gene regulatory networks, we focus on the integration of three approaches. First, we describe an in silico prediction method of transcription factor-DNA binding, then an in vivo assay of transcription factor-DNA binding by yeast-1-hybrid and lastly the identification of co-expression clusters by transcription factor induction in planta. Each of these methods provides a unique tool to advance our understanding of gene regulation, and together provide a robust model for the generation of gene regulatory networks.

  20. Predicting Variabilities in Cardiac Gene Expression with a Boolean Network Incorporating Uncertainty.

    Science.gov (United States)

    Grieb, Melanie; Burkovski, Andre; Sträng, J Eric; Kraus, Johann M; Groß, Alexander; Palm, Günther; Kühl, Michael; Kestler, Hans A

    2015-01-01

    Gene interactions in cells can be represented by gene regulatory networks. A Boolean network models gene interactions according to rules where gene expression is represented by binary values (on / off or {1, 0}). In reality, however, the gene's state can have multiple values due to biological properties. Furthermore, the noisy nature of the experimental design results in uncertainty about a state of the gene. Here we present a new Boolean network paradigm to allow intermediate values on the interval [0, 1]. As in the Boolean network, fixed points or attractors of such a model correspond to biological phenotypes or states. We use our new extension of the Boolean network paradigm to model gene expression in first and second heart field lineages which are cardiac progenitor cell populations involved in early vertebrate heart development. By this we are able to predict additional biological phenotypes that the Boolean model alone is not able to identify without utilizing additional biological knowledge. The additional phenotypes predicted by the model were confirmed by published biological experiments. Furthermore, the new method predicts gene expression propensities for modelled but yet to be analyzed genes.

  1. Enhancing the Lasso Approach for Developing a Survival Prediction Model Based on Gene Expression Data

    Directory of Open Access Journals (Sweden)

    Shuhei Kaneko

    2015-01-01

    Full Text Available In the past decade, researchers in oncology have sought to develop survival prediction models using gene expression data. The least absolute shrinkage and selection operator (lasso has been widely used to select genes that truly correlated with a patient’s survival. The lasso selects genes for prediction by shrinking a large number of coefficients of the candidate genes towards zero based on a tuning parameter that is often determined by a cross-validation (CV. However, this method can pass over (or fail to identify true positive genes (i.e., it identifies false negatives in certain instances, because the lasso tends to favor the development of a simple prediction model. Here, we attempt to monitor the identification of false negatives by developing a method for estimating the number of true positive (TP genes for a series of values of a tuning parameter that assumes a mixture distribution for the lasso estimates. Using our developed method, we performed a simulation study to examine its precision in estimating the number of TP genes. Additionally, we applied our method to a real gene expression dataset and found that it was able to identify genes correlated with survival that a CV method was unable to detect.

  2. Prediction of gene-phenotype associations in humans, mice, and plants using phenologs.

    Science.gov (United States)

    Woods, John O; Singh-Blom, Ulf Martin; Laurent, Jon M; McGary, Kriston L; Marcotte, Edward M

    2013-06-21

    Phenotypes and diseases may be related to seemingly dissimilar phenotypes in other species by means of the orthology of underlying genes. Such "orthologous phenotypes," or "phenologs," are examples of deep homology, and may be used to predict additional candidate disease genes. In this work, we develop an unsupervised algorithm for ranking phenolog-based candidate disease genes through the integration of predictions from the k nearest neighbor phenologs, comparing classifiers and weighting functions by cross-validation. We also improve upon the original method by extending the theory to paralogous phenotypes. Our algorithm makes use of additional phenotype data--from chicken, zebrafish, and E. coli, as well as new datasets for C. elegans--establishing that several types of annotations may be treated as phenotypes. We demonstrate the use of our algorithm to predict novel candidate genes for human atrial fibrillation (such as HRH2, ATP4A, ATP4B, and HOPX) and epilepsy (e.g., PAX6 and NKX2-1). We suggest gene candidates for pharmacologically-induced seizures in mouse, solely based on orthologous phenotypes from E. coli. We also explore the prediction of plant gene-phenotype associations, as for the Arabidopsis response to vernalization phenotype. We are able to rank gene predictions for a significant portion of the diseases in the Online Mendelian Inheritance in Man database. Additionally, our method suggests candidate genes for mammalian seizures based only on bacterial phenotypes and gene orthology. We demonstrate that phenotype information may come from diverse sources, including drug sensitivities, gene ontology biological processes, and in situ hybridization annotations. Finally, we offer testable candidates for a variety of human diseases, plant traits, and other classes of phenotypes across a wide array of species.

  3. Combined effects of thrombosis pathway gene variants predict cardiovascular events.

    Directory of Open Access Journals (Sweden)

    Kirsi Auro

    2007-07-01

    Full Text Available The genetic background of complex diseases is proposed to consist of several low-penetrance risk loci. Addressing this complexity likely requires both large sample size and simultaneous analysis of different predisposing variants. We investigated the role of four thrombosis genes: coagulation factor V (F5, intercellular adhesion molecule 1 (ICAM1, protein C (PROC, and thrombomodulin (THBD in cardiovascular diseases. Single allelic gene variants and their pair-wise combinations were analyzed in two independently sampled population cohorts from Finland. From among 14,140 FINRISK participants (FINRISK-92, n = 5,999 and FINRISK-97, n = 8,141, we selected for genotyping a sample of 2,222, including 528 incident cardiovascular disease (CVD cases and random subcohorts totaling 786. To cover all known common haplotypes (>10%, 54 single nucleotide polymorphisms (SNPs were genotyped. Classification-tree analysis identified 11 SNPs that were further analyzed in Cox's proportional hazard model as single variants and pair-wise combinations. Multiple testing was controlled by use of two independent cohorts and with false-discovery rate. Several CVD risk variants were identified: In women, the combination of F5 rs7542281 x THBD rs1042580, together with three single F5 SNPs, was associated with CVD events. Among men, PROC rs1041296, when combined with either ICAM1 rs5030341 or F5 rs2269648, was associated with total mortality. As a single variant, PROC rs1401296, together with the F5 Leiden mutation, was associated with ischemic stroke events. Our strategy to combine the classification-tree analysis with more traditional genetic models was successful in identifying SNPs-acting either in combination or as single variants--predisposing to CVD, and produced consistent results in two independent cohorts. These results suggest that variants in these four thrombosis genes contribute to arterial cardiovascular events at population level.

  4. Accurate prediction of secondary metabolite gene clusters in filamentous fungi

    DEFF Research Database (Denmark)

    Andersen, Mikael Rørdam; Nielsen, Jakob Blæsbjerg; Klitgaard, Andreas

    2013-01-01

    Biosynthetic pathways of secondary metabolites from fungi are currently subject to an intense effort to elucidate the genetic basis for these compounds due to their large potential within pharmaceutics and synthetic biochemistry. The preferred method is methodical gene deletions to identify suppo...... used A. nidulans for our method development and validation due to the wealth of available biochemical data, but the method can be applied to any fungus with a sequenced and assembled genome, thus supporting further secondary metabolite pathway elucidation in the fungal kingdom....

  5. An endometrial gene expression signature accurately predicts recurrent implantation failure after IVF

    Science.gov (United States)

    Koot, Yvonne E. M.; van Hooff, Sander R.; Boomsma, Carolien M.; van Leenen, Dik; Groot Koerkamp, Marian J. A.; Goddijn, Mariëtte; Eijkemans, Marinus J. C.; Fauser, Bart C. J. M.; Holstege, Frank C. P.; Macklon, Nick S.

    2016-01-01

    The primary limiting factor for effective IVF treatment is successful embryo implantation. Recurrent implantation failure (RIF) is a condition whereby couples fail to achieve pregnancy despite consecutive embryo transfers. Here we describe the collection of gene expression profiles from mid-luteal phase endometrial biopsies (n = 115) from women experiencing RIF and healthy controls. Using a signature discovery set (n = 81) we identify a signature containing 303 genes predictive of RIF. Independent validation in 34 samples shows that the gene signature predicts RIF with 100% positive predictive value (PPV). The strength of the RIF associated expression signature also stratifies RIF patients into distinct groups with different subsequent implantation success rates. Exploration of the expression changes suggests that RIF is primarily associated with reduced cellular proliferation. The gene signature will be of value in counselling and guiding further treatment of women who fail to conceive upon IVF and suggests new avenues for developing intervention. PMID:26797113

  6. SCGPred: A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence

    Institute of Scientific and Technical Information of China (English)

    Xiao Li; Qingan Ren; Yang Weng; Haoyang Cai; Yunmin Zhu; Yizheng Zhang

    2008-01-01

    Predicting protein-coding genes still remains a significant challenge. Although a variety of computational programs that use commonly machine learning methods have emerged, the accuracy of predictions remains a low level when implementing in large genomic sequences. Moreover, computational gene finding in newly sequenced genomes is especially a difficult task due to the absence of a training set of abundant validated genes. Here we present a new gene-finding program, SCGPred,to improve the accuracy of prediction by combining multiple sources of evidence.SCGPred can perform both supervised method in previously well-studied genomes and unsupervised one in novel genomes. By testing with datasets composed of large DNA sequences from human and a novel genome of Ustilago maydi, SCGPred gains a significant improvement in comparison to the popular ab initio gene predictors. We also demonstrate that SCGPred can significantly improve prediction in novel genomes by combining several foreign gene finders with similarity alignments, which is superior to other unsupervised methods. Therefore, SCGPred can serve as an alternative gene-finding tool for newly sequenced eukaryotic genomes. The program is freely available at http://bio.scu.edu.cn/SCGPred/.

  7. Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes.

    Directory of Open Access Journals (Sweden)

    Daniel S Himmelstein

    2015-07-01

    Full Text Available The first decade of Genome Wide Association Studies (GWAS has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants and leveraging existing data to increase the power of existing and future studies through prioritization. We explore edge prediction on heterogeneous networks--graphs with multiple node and edge types--for accomplishing both tasks. First we constructed a network with 18 node types--genes, diseases, tissues, pathophysiologies, and 14 MSigDB (molecular signatures database collections--and 19 edge types from high-throughput publicly-available resources. From this network composed of 40,343 nodes and 1,608,168 edges, we extracted features that describe the topology between specific genes and diseases. Next, we trained a model from GWAS associations and predicted the probability of association between each protein-coding gene and each of 29 well-studied complex diseases. The model, which achieved 132-fold enrichment in precision at 10% recall, outperformed any individual domain, highlighting the benefit of integrative approaches. We identified pleiotropy, transcriptional signatures of perturbations, pathways, and protein interactions as influential mechanisms explaining pathogenesis. Our method successfully predicted the results (with AUROC = 0.79 from a withheld multiple sclerosis (MS GWAS despite starting with only 13 previously associated genes. Finally, we combined our network predictions with statistical evidence of association to propose four novel MS genes, three of which (JAK2, REL, RUNX3 validated on the masked GWAS. Furthermore, our predictions provide biological support highlighting REL as the causal gene within its gene-rich locus. Users can browse all predictions online (http://het.io. Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach

  8. Manteia, a predictive data mining system for vertebrate genes and its applications to human genetic diseases.

    Science.gov (United States)

    Tassy, Olivier; Pourquié, Olivier

    2014-01-01

    The function of genes is often evolutionarily conserved, and comparing the annotation of ortholog genes in different model organisms has proved to be a powerful predictive tool to identify the function of human genes. Here, we describe Manteia, a resource available online at http://manteia.igbmc.fr. Manteia allows the comparison of embryological, expression, molecular and etiological data from human, mouse, chicken and zebrafish simultaneously to identify new functional and structural correlations and gene-disease associations. Manteia is particularly useful for the analysis of gene lists produced by high-throughput techniques such as microarrays or proteomics. Data can be easily analyzed statistically to characterize the function of groups of genes and to correlate the different aspects of their annotation. Sophisticated querying tools provide unlimited ways to merge the information contained in Manteia along with the possibility of introducing custom user-designed biological questions into the system. This allows for example to connect all the animal experimental results and annotations to the human genome, and take advantage of data not available for human to look for candidate genes responsible for genetic disorders. Here, we demonstrate the predictive and analytical power of the system to predict candidate genes responsible for human genetic diseases.

  9. Predicting Polymerase Ⅱ Core Promoters by Cooperating Transcription Factor Binding Sites in Eukaryotic Genes

    Institute of Scientific and Technical Information of China (English)

    Xiao-Tu MA; Min-Ping QIAN; Hai-Xu TANG

    2004-01-01

    Several discriminate functions for predicting core promoters that based on the potential cooperation between transcription factor binding sites (TFBSs) are discussed. It is demonstrated that the promoter predicting accuracy is improved when the cooperation among TFBSs is taken into consideration.The core promoter region of a newly discovered gene CKLFSF1 is predicted to locate more than 1.5 kb far away from the 5′ end of the transcript and in the last intron of its upstream gene, which is experimentally confirmed later. The core promoters of 3402 human RefSeq sequences, obtained by extending the mRNAs in human genome sequences, are predicted by our algorithm, and there are about 60% of the predicted core promoters locating within the ± 500 bp region relative to the annotated transcription start site.

  10. The use of multiple hierarchically independent gene ontology terms in gene function prediction and genome annotation

    NARCIS (Netherlands)

    Kourmpetis, Y.I.A.; Burgt, van der A.; Bink, M.C.A.M.; Braak, ter C.J.F.; Ham, van R.C.H.J.

    2007-01-01

    The Gene Ontology (GO) is a widely used controlled vocabulary for the description of gene function. In this study we quantify the usage of multiple and hierarchically independent GO terms in the curated genome annotations of seven well-studied species. In most genomes, significant proportions (6 -

  11. Prediction and validation of gene-disease associations using methods inspired by social network analyses.

    Directory of Open Access Journals (Sweden)

    U Martin Singh-Blom

    Full Text Available Correctly identifying associations of genes with diseases has long been a goal in biology. With the emergence of large-scale gene-phenotype association datasets in biology, we can leverage statistical and machine learning methods to help us achieve this goal. In this paper, we present two methods for predicting gene-disease associations based on functional gene associations and gene-phenotype associations in model organisms. The first method, the Katz measure, is motivated from its success in social network link prediction, and is very closely related to some of the recent methods proposed for gene-disease association inference. The second method, called Catapult (Combining dATa Across species using Positive-Unlabeled Learning Techniques, is a supervised machine learning method that uses a biased support vector machine where the features are derived from walks in a heterogeneous gene-trait network. We study the performance of the proposed methods and related state-of-the-art methods using two different evaluation strategies, on two distinct data sets, namely OMIM phenotypes and drug-target interactions. Finally, by measuring the performance of the methods using two different evaluation strategies, we show that even though both methods perform very well, the Katz measure is better at identifying associations between traits and poorly studied genes, whereas Catapult is better suited to correctly identifying gene-trait associations overall [corrected].

  12. Software Suite for Gene and Protein Annotation Prediction and Similarity Search.

    Science.gov (United States)

    Chicco, Davide; Masseroli, Marco

    2015-01-01

    In the computational biology community, machine learning algorithms are key instruments for many applications, including the prediction of gene-functions based upon the available biomolecular annotations. Additionally, they may also be employed to compute similarity between genes or proteins. Here, we describe and discuss a software suite we developed to implement and make publicly available some of such prediction methods and a computational technique based upon Latent Semantic Indexing (LSI), which leverages both inferred and available annotations to search for semantically similar genes. The suite consists of three components. BioAnnotationPredictor is a computational software module to predict new gene-functions based upon Singular Value Decomposition of available annotations. SimilBio is a Web module that leverages annotations available or predicted by BioAnnotationPredictor to discover similarities between genes via LSI. The suite includes also SemSim, a new Web service built upon these modules to allow accessing them programmatically. We integrated SemSim in the Bio Search Computing framework (http://www.bioinformatics.deib. polimi.it/bio-seco/seco/), where users can exploit the Search Computing technology to run multi-topic complex queries on multiple integrated Web services. Accordingly, researchers may obtain ranked answers involving the computation of the functional similarity between genes in support of biomedical knowledge discovery.

  13. Adipose gene expression prior to weight loss can differentiate and weakly predict dietary responders.

    Directory of Open Access Journals (Sweden)

    David M Mutch

    Full Text Available BACKGROUND: The ability to identify obese individuals who will successfully lose weight in response to dietary intervention will revolutionize disease management. Therefore, we asked whether it is possible to identify subjects who will lose weight during dietary intervention using only a single gene expression snapshot. METHODOLOGY/PRINCIPAL FINDINGS: The present study involved 54 female subjects from the Nutrient-Gene Interactions in Human Obesity-Implications for Dietary Guidelines (NUGENOB trial to determine whether subcutaneous adipose tissue gene expression could be used to predict weight loss prior to the 10-week consumption of a low-fat hypocaloric diet. Using several statistical tests revealed that the gene expression profiles of responders (8-12 kgs weight loss could always be differentiated from non-responders (<4 kgs weight loss. We also assessed whether this differentiation was sufficient for prediction. Using a bottom-up (i.e. black-box approach, standard class prediction algorithms were able to predict dietary responders with up to 61.1%+/-8.1% accuracy. Using a top-down approach (i.e. using differentially expressed genes to build a classifier improved prediction accuracy to 80.9%+/-2.2%. CONCLUSION: Adipose gene expression profiling prior to the consumption of a low-fat diet is able to differentiate responders from non-responders as well as serve as a weak predictor of subjects destined to lose weight. While the degree of prediction accuracy currently achieved with a gene expression snapshot is perhaps insufficient for clinical use, this work reveals that the comprehensive molecular signature of adipose tissue paves the way for the future of personalized nutrition.

  14. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes

    Directory of Open Access Journals (Sweden)

    Yang Yi-Fan

    2007-03-01

    Full Text Available Abstract Background Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. Results This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs and Translation Initiation Sites (TISs. The former is based on a linguistic "Entropy Density Profile" (EDP model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Conclusion Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  15. Challenges of incorporating gene expression data to predict HCC prognosis in the age of systems biology

    Institute of Scientific and Technical Information of China (English)

    Yan Du; Guang-Wen Cao

    2012-01-01

    Hepatocellular carcinoma (HCC) is a leading cause of cancer-related death worldwide.The recurrence of HCC after curative treatments is currently a major hurdle.Identification of subsets of patients with distinct prognosis provides an opportunity to tailor therapeutic approaches as well as to select the patients with specific sub-phenotypes for targeted therapy.Thus,the development of gene expression profiles to improve the prediction of HCC prognosis is important for HCC management.Although several gene signatures have been evaluated for the prediction of HCC prognosis,there is no consensus on the predictive power of these signatures.Using systematic approaches to evaluate these signatures and combine them with clinicopathologic information may provide more accurate prediction of HCC prognosis.Recently,Villanueva et al[13] developed a composite prognostic model incorporating gene expression patterns in both tumor and adjacent tissues to predict HCC recurrence.In this commentary,we summarize the current progress in using gene signatures to predict HCC prognosis,and discuss the importance,existing issues and future research directions in this field.

  16. Global prediction of tissue-specific gene expression and context-dependent gene networks in Caenorhabditis elegans.

    Directory of Open Access Journals (Sweden)

    Maria D Chikina

    2009-06-01

    Full Text Available Tissue-specific gene expression plays a fundamental role in metazoan biology and is an important aspect of many complex diseases. Nevertheless, an organism-wide map of tissue-specific expression remains elusive due to difficulty in obtaining these data experimentally. Here, we leveraged existing whole-animal Caenorhabditis elegans microarray data representing diverse conditions and developmental stages to generate accurate predictions of tissue-specific gene expression and experimentally validated these predictions. These patterns of tissue-specific expression are more accurate than existing high-throughput experimental studies for nearly all tissues; they also complement existing experiments by addressing tissue-specific expression present at particular developmental stages and in small tissues. We used these predictions to address several experimentally challenging questions, including the identification of tissue-specific transcriptional motifs and the discovery of potential miRNA regulation specific to particular tissues. We also investigate the role of tissue context in gene function through tissue-specific functional interaction networks. To our knowledge, this is the first study producing high-accuracy predictions of tissue-specific expression and interactions for a metazoan organism based on whole-animal data.

  17. Gene expression variation to predict 10-year survival in lymph-node-negative breast cancer

    Directory of Open Access Journals (Sweden)

    Karlsson Per

    2008-09-01

    Full Text Available Abstract Background It is of great significance to find better markers to correctly distinguish between high-risk and low-risk breast cancer patients since the majority of breast cancer cases are at present being overtreated. Methods 46 tumours from node-negative breast cancer patients were studied with gene expression microarrays. A t-test was carried out in order to find a set of genes where the expression might predict clinical outcome. Two classifiers were used for evaluation of the gene lists, a correlation-based classifier and a Voting Features Interval (VFI classifier. We then evaluated the predictive accuracy of this expression signature on tumour sets from two similar studies on lymph-node negative patients. They had both developed gene expression signatures superior to current methods in classifying node-negative breast tumours. These two signatures were also tested on our material. Results A list of 51 genes whose expression profiles could predict clinical outcome with high accuracy in our material (96% or 89% accuracy in cross-validation, depending on type of classifier was developed. When tested on two independent data sets, the expression signature based on the 51 identified genes had good predictive qualities in one of the data sets (74% accuracy, whereas their predictive value on the other data set were poor, presumably due to the fact that only 23 of the 51 genes were found in that material. We also found that previously developed expression signatures could predict clinical outcome well to moderately well in our material (72% and 61%, respectively. Conclusion The list of 51 genes derived in this study might have potential for clinical utility as a prognostic gene set, and may include candidate genes of potential relevance for clinical outcome in breast cancer. According to the predictions by this expression signature, 30 of the 46 patients may have benefited from different adjuvant treatment than they recieved. Trial

  18. Proteome Profiling Outperforms Transcriptome Profiling for Coexpression Based Gene Function Prediction

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Jing; Ma, Zihao; Carr, Steven A.; Mertins, Philipp; Zhang, Hui; Zhang, Zhen; Chan, Daniel W.; Ellis, Matthew J. C.; Townsend, R. Reid; Smith, Richard D.; McDermott, Jason E.; Chen, Xian; Paulovich, Amanda G.; Boja, Emily S.; Mesri, Mehdi; Kinsinger, Christopher R.; Rodriguez, Henry; Rodland, Karin D.; Liebler, Daniel C.; Zhang, Bing

    2016-11-11

    Coexpression of mRNAs under multiple conditions is commonly used to infer cofunctionality of their gene products despite well-known limitations of this “guilt-by-association” (GBA) approach. Recent advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for coexpression based gene function prediction has not been systematically investigated. Here, we address this question by constructing and analyzing mRNA and protein coexpression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Our analyses revealed a marked difference in wiring between the mRNA and protein coexpression networks. Whereas protein coexpression was driven primarily by functional similarity between coexpressed genes, mRNA coexpression was driven by both cofunction and chromosomal colocalization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein coexpression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. Our results demonstrate that proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies

  19. Proteome Profiling Outperforms Transcriptome Profiling for Coexpression Based Gene Function Prediction*

    Science.gov (United States)

    Wang, Jing; Ma, Zihao; Carr, Steven A.; Mertins, Philipp; Zhang, Hui; Zhang, Zhen; Chan, Daniel W.; Ellis, Matthew J. C.; Townsend, R. Reid; Smith, Richard D.; McDermott, Jason E.; Chen, Xian; Paulovich, Amanda G.; Boja, Emily S.; Mesri, Mehdi; Kinsinger, Christopher R.; Rodriguez, Henry; Rodland, Karin D.; Liebler, Daniel C.; Zhang, Bing

    2017-01-01

    Coexpression of mRNAs under multiple conditions is commonly used to infer cofunctionality of their gene products despite well-known limitations of this “guilt-by-association” (GBA) approach. Recent advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for coexpression based gene function prediction has not been systematically investigated. Here, we address this question by constructing and analyzing mRNA and protein coexpression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Our analyses revealed a marked difference in wiring between the mRNA and protein coexpression networks. Whereas protein coexpression was driven primarily by functional similarity between coexpressed genes, mRNA coexpression was driven by both cofunction and chromosomal colocalization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein coexpression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. Our results demonstrate that proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies. PMID

  20. Prediction of metastasis from low-malignant breast cancer by gene expression profiling

    DEFF Research Database (Denmark)

    Thomassen, Mads; Tan, Qihua; Eiriksdottir, Freyja;

    2007-01-01

    Promising results for prediction of outcome in breast cancer have been obtained by genome wide gene expression profiling. Some studies have suggested that an extensive overtreatment of breast cancer patients might be reduced by risk assessment with gene expression profiling. A patient group hardly...... examined in these studies is the low-risk patients for whom outcome is very difficult to predict with currently used methods. These patients do not receive adjuvant treatment according to the guidelines of the Danish Breast Cancer Cooperative Group (DBCG). In this study, 26 tumors from low-risk patients...... demonstrated high cross-platform consistency of the classifiers. Higher performance of HUMAC32 was demonstrated among the low-malignant cancers compared with the 70-gene classifier. This suggests that although the metastatic potential to some extend is determined by the same genes in groups of tumors...

  1. Gene prediction in metagenomic fragments: A large scale machine learning approach

    Directory of Open Access Journals (Sweden)

    Morgenstern Burkhard

    2008-04-01

    Full Text Available Abstract Background Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. Results We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Conclusion Large scale machine learning methods are well-suited for gene

  2. Network-based gene prediction for Plasmodium falciparum malaria towards genetics-based drug discovery

    OpenAIRE

    Chen, Yang; Xu, Rong

    2015-01-01

    Background Malaria is the most deadly parasitic infectious disease. Existing drug treatments have limited efficacy in malaria elimination, and the complex pathogenesis of the disease is not fully understood. Detecting novel malaria-associated genes not only contributes in revealing the disease pathogenesis, but also facilitates discovering new targets for anti-malaria drugs. Methods In this study, we developed a network-based approach to predict malaria-associated genes. We constructed a cros...

  3. Formal modeling of Gene Ontology annotation predictions based on factor graphs

    Science.gov (United States)

    Spetale, Flavio; Murillo, Javier; Tapia, Elizabeth; Arce, Débora; Ponce, Sergio; Bulacio, Pilar

    2016-04-01

    Gene Ontology (GO) is a hierarchical vocabulary for gene product annotation. Its synergy with machine learning classification methods has been widely used for the prediction of protein functions. Current classification methods rely on heuristic solutions to check the consistency with some aspects of the underlying GO structure. In this work we formalize the GO is-a relationship through predicate logic. Moreover, an ontology model based on Forney Factor Graph (FFG) is shown on a general fragment of Cellular Component GO.

  4. Prediction of key genes in ovarian cancer treated with decitabine based on network strategy.

    Science.gov (United States)

    Wang, Yu-Zhen; Qiu, Sheng-Chun

    2016-06-01

    The objective of the present study was to predict key genes in ovarian cancer before and after treatment with decitabine utilizing a network approach and to reveal the molecular mechanism. Pathogenic networks of ovarian cancer before and after treatment were identified based on known pathogenic genes (seed genes) and differentially expressed genes (DEGs) detected by Significance Analysis of Microarrays (SAM) method. A weight was assigned to each gene in the pathogenic network and then candidate genes were evaluated. Topological properties (degree, betweenness, closeness and stress) of candidate genes were analyzed to investigate more confident pathogenic genes. Pathway enrichment analysis for candidate and seed genes were conducted. Validation of candidate gene expression in ovarian cancer was performed by reverse transcriptase-polymerase chain reaction (RT-PCR) assays. There were 73 nodes and 147 interactions in the pathogenic network before treatment, while 47 nodes and 66 interactions after treatment. A total of 32 candidate genes were identified in the before treatment group of ovarian cancer, of which 16 were rightly candidate genes after treatment and the others were silenced. We obtained 5 key genes (PIK3R2, CCNB1, IL2, IL1B and CDC6) for decitabine treatment that were validated by RT-PCR. In conclusion, we successfully identified 5 key genes (PIK3R2, CCNB1, IL2, IL1B and CDC6) and validated them, which provides insight into the molecular mechanisms of decitabine treatment and may be potential pathogenic biomarkers for the therapy of ovarian cancer.

  5. Use of Information Measures and Their Approximations to Detect Predictive Gene-Gene Interaction

    Directory of Open Access Journals (Sweden)

    Jan Mielniczuk

    2017-01-01

    Full Text Available We reconsider the properties and relationships of the interaction information and its modified versions in the context of detecting the interaction of two SNPs for the prediction of a binary outcome when interaction information is positive. This property is called predictive interaction, and we state some new sufficient conditions for it to hold true. We also study chi square approximations to these measures. It is argued that interaction information is a different and sometimes more natural measure of interaction than the logistic interaction parameter especially when SNPs are dependent. We introduce a novel measure of predictive interaction based on interaction information and its modified version. In numerical experiments, which use copulas to model dependence, we study examples when the logistic interaction parameter is zero or close to zero for which predictive interaction is detected by the new measure, while it remains undetected by the likelihood ratio test.

  6. A semi-supervised method for predicting transcription factor-gene interactions in Escherichia coli.

    Directory of Open Access Journals (Sweden)

    Jason Ernst

    2008-03-01

    Full Text Available While Escherichia coli has one of the most comprehensive datasets of experimentally verified transcriptional regulatory interactions of any organism, it is still far from complete. This presents a problem when trying to combine gene expression and regulatory interactions to model transcriptional regulatory networks. Using the available regulatory interactions to predict new interactions may lead to better coverage and more accurate models. Here, we develop SEREND (SEmi-supervised REgulatory Network Discoverer, a semi-supervised learning method that uses a curated database of verified transcriptional factor-gene interactions, DNA sequence binding motifs, and a compendium of gene expression data in order to make thousands of new predictions about transcription factor-gene interactions, including whether the transcription factor activates or represses the gene. Using genome-wide binding datasets for several transcription factors, we demonstrate that our semi-supervised classification strategy improves the prediction of targets for a given transcription factor. To further demonstrate the utility of our inferred interactions, we generated a new microarray gene expression dataset for the aerobic to anaerobic shift response in E. coli. We used our inferred interactions with the verified interactions to reconstruct a dynamic regulatory network for this response. The network reconstructed when using our inferred interactions was better able to correctly identify known regulators and suggested additional activators and repressors as having important roles during the aerobic-anaerobic shift interface.

  7. A classification-based framework for predicting and analyzing gene regulatory response.

    Science.gov (United States)

    Kundaje, Anshul; Middendorf, Manuel; Shah, Mihir; Wiggins, Chris H; Freund, Yoav; Leslie, Christina

    2006-03-20

    We have recently introduced a predictive framework for studying gene transcriptional regulation in simpler organisms using a novel supervised learning algorithm called GeneClass. GeneClass is motivated by the hypothesis that in model organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular microarray experiment based on the presence of binding site subsequences ("motifs") in the gene's regulatory region and the expression levels of regulators such as transcription factors in the experiment ("parents"). GeneClass formulates the learning task as a classification problem--predicting +1 and -1 labels corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. Using the Adaboost algorithm, GeneClass learns a prediction function in the form of an alternating decision tree, a margin-based generalization of a decision tree. In the current work, we introduce a new, robust version of the GeneClass algorithm that increases stability and computational efficiency, yielding a more scalable and reliable predictive model. The improved stability of the prediction tree enables us to introduce a detailed post-processing framework for biological interpretation, including individual and group target gene analysis to reveal condition-specific regulation programs and to suggest signaling pathways. Robust GeneClass uses a novel stabilized variant of boosting that allows a set of correlated features, rather than single features, to be included at nodes of the tree; in this way, biologically important features that are correlated with the single best feature are retained rather than decorrelated and lost in the next round of boosting. Other computational developments include fast matrix computation of the loss function for all features, allowing scalability to large datasets, and the use of abstaining weak rules, which results in a more

  8. Distance in cancer gene expression from stem cells predicts patient survival.

    Science.gov (United States)

    Riester, Markus; Wu, Hua-Jun; Zehir, Ahmet; Gönen, Mithat; Moreira, Andre L; Downey, Robert J; Michor, Franziska

    2017-01-01

    The degree of histologic cellular differentiation of a cancer has been associated with prognosis but is subjectively assessed. We hypothesized that information about tumor differentiation of individual cancers could be derived objectively from cancer gene expression data, and would allow creation of a cancer phylogenetic framework that would correlate with clinical, histologic and molecular characteristics of the cancers, as well as predict prognosis. Here we utilized mRNA expression data from 4,413 patient samples with 7 diverse cancer histologies to explore the utility of ordering samples by their distance in gene expression from that of stem cells. A differentiation baseline was obtained by including expression data of human embryonic stem cells (hESC) and human mesenchymal stem cells (hMSC) for solid tumors, and of hESC and CD34+ cells for liquid tumors. We found that the correlation distance (the degree of similarity) between the gene expression profile of a tumor sample and that of stem cells orients cancers in a clinically coherent fashion. For all histologies analyzed (including carcinomas, sarcomas, and hematologic malignancies), patients with cancers with gene expression patterns most similar to that of stem cells had poorer overall survival. We also found that the genes in all undifferentiated cancers of diverse histologies that were most differentially expressed were associated with up-regulation of specific oncogenes and down-regulation of specific tumor suppressor genes. Thus, a stem cell-oriented phylogeny of cancers allows for the derivation of a novel cancer gene expression signature found in all undifferentiated forms of diverse cancer histologies, that is competitive in predicting overall survival in cancer patients compared to previously published prediction models, and is coherent in that gene expression was associated with up-regulation of specific oncogenes and down-regulation of specific tumor suppressor genes associated with regulation of

  9. Prediction and experimental validation of novel STAT3 target genes in human cancer cells.

    Directory of Open Access Journals (Sweden)

    Young Min Oh

    Full Text Available The comprehensive identification of functional transcription factor binding sites (TFBSs is an important step in understanding complex transcriptional regulatory networks. This study presents a motif-based comparative approach, STAT-Finder, for identifying functional DNA binding sites of STAT3 transcription factor. STAT-Finder combines STAT-Scanner, which was designed to predict functional STAT TFBSs with improved sensitivity, and a motif-based alignment to minimize false positive prediction rates. Using two reference sets containing promoter sequences of known STAT3 target genes, STAT-Finder identified functional STAT3 TFBSs with enhanced prediction efficiency and sensitivity relative to other conventional TFBS prediction tools. In addition, STAT-Finder identified novel STAT3 target genes among a group of genes that are over-expressed in human cancer cells. The binding of STAT3 to the predicted TFBSs was also experimentally confirmed through chromatin immunoprecipitation. Our proposed method provides a systematic approach to the prediction of functional TFBSs that can be applied to other TFs.

  10. Can Thrifty Gene(s) or Predictive Fetal Programming for Thriftiness Lead to Obesity?

    Science.gov (United States)

    Baig, Ulfat; Belsare, Prajakta; Watve, Milind; Jog, Maithili

    2011-01-01

    Obesity and related disorders are thought to have their roots in metabolic "thriftiness" that evolved to combat periodic starvation. The association of low birth weight with obesity in later life caused a shift in the concept from thrifty gene to thrifty phenotype or anticipatory fetal programming. The assumption of thriftiness is implicit in obesity research. We examine here, with the help of a mathematical model, the conditions for evolution of thrifty genes or fetal programming for thriftiness. The model suggests that a thrifty gene cannot exist in a stable polymorphic state in a population. The conditions for evolution of thrifty fetal programming are restricted if the correlation between intrauterine and lifetime conditions is poor. Such a correlation is not observed in natural courses of famine. If there is fetal programming for thriftiness, it could have evolved in anticipation of social factors affecting nutrition that can result in a positive correlation.

  11. Can Thrifty Gene(s or Predictive Fetal Programming for Thriftiness Lead to Obesity?

    Directory of Open Access Journals (Sweden)

    Ulfat Baig

    2011-01-01

    Full Text Available Obesity and related disorders are thought to have their roots in metabolic “thriftiness” that evolved to combat periodic starvation. The association of low birth weight with obesity in later life caused a shift in the concept from thrifty gene to thrifty phenotype or anticipatory fetal programming. The assumption of thriftiness is implicit in obesity research. We examine here, with the help of a mathematical model, the conditions for evolution of thrifty genes or fetal programming for thriftiness. The model suggests that a thrifty gene cannot exist in a stable polymorphic state in a population. The conditions for evolution of thrifty fetal programming are restricted if the correlation between intrauterine and lifetime conditions is poor. Such a correlation is not observed in natural courses of famine. If there is fetal programming for thriftiness, it could have evolved in anticipation of social factors affecting nutrition that can result in a positive correlation.

  12. Entropy-based gene ranking without selection bias for the predictive classification of microarray data

    Directory of Open Access Journals (Sweden)

    Serafini Maria

    2003-11-01

    Full Text Available Abstract Background We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process. Results With E-RFE, we speed up the recursive feature elimination (RFE with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Conclusions Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.

  13. An atlas of tissue-specific conserved coexpression for functional annotation and disease gene prediction.

    Science.gov (United States)

    Piro, Rosario Michael; Ala, Ugo; Molineris, Ivan; Grassi, Elena; Bracco, Chiara; Perego, Gian Paolo; Provero, Paolo; Di Cunto, Ferdinando

    2011-11-01

    Gene coexpression relationships that are phylogenetically conserved between human and mouse have been shown to provide important clues about gene function that can be efficiently used to identify promising candidate genes for human hereditary disorders. In the past, such approaches have considered mostly generic gene expression profiles that cover multiple tissues and organs. The individual genes of multicellular organisms, however, can participate in different transcriptional programs, operating at scales as different as single-cell types, tissues, organs, body regions or the entire organism. Therefore, systematic analysis of tissue-specific coexpression could be, in principle, a very powerful strategy to dissect those functional relationships among genes that emerge only in particular tissues or organs. In this report, we show that, in fact, conserved coexpression as determined from tissue-specific and condition-specific data sets can predict many functional relationships that are not detected by analyzing heterogeneous microarray data sets. More importantly, we find that, when combined with disease networks, the simultaneous use of both generic (multi-tissue) and tissue-specific conserved coexpression allows a more efficient prediction of human disease genes than the use of generic conserved coexpression alone. Using this strategy, we were able to identify high-probability candidates for 238 orphan disease loci. We provide proof of concept that this combined use of generic and tissue-specific conserved coexpression can be very useful to prioritize the mutational candidates obtained from deep-sequencing projects, even in the case of genetic disorders as heterogeneous as XLMR.

  14. Epigenomic modifications predict active promoters and gene structure in Toxoplasma gondii.

    Directory of Open Access Journals (Sweden)

    Mathieu Gissot

    2007-06-01

    Full Text Available Mechanisms of gene regulation are poorly understood in Apicomplexa, a phylum that encompasses deadly human pathogens like Plasmodium and Toxoplasma. Initial studies suggest that epigenetic phenomena, including histone modifications and chromatin remodeling, have a profound effect upon gene expression and expression of virulence traits. Using the model organism Toxoplasma gondii, we characterized the epigenetic organization and transcription patterns of a contiguous 1% of the T. gondii genome using custom oligonucleotide microarrays. We show that methylation and acetylation of histones H3 and H4 are landmarks of active promoters in T. gondii that allow us to deduce the position and directionality of gene promoters with >95% accuracy. These histone methylation and acetylation "activation" marks are strongly associated with gene expression. We also demonstrate that the pattern of histone H3 arginine methylation distinguishes certain promoters, illustrating the complexity of the histone modification machinery in Toxoplasma. By integrating epigenetic data, gene prediction analysis, and gene expression data from the tachyzoite stage, we illustrate feasibility of creating an epigenomic map of T. gondii tachyzoite gene expression. Further, we illustrate the utility of the epigenomic map to empirically and biologically annotate the genome and show that this approach enables identification of previously unknown genes. Thus, our epigenomics approach provides novel insights into regulation of gene expression in the Apicomplexa. In addition, with its compact genome, genetic tractability, and discrete life cycle stages, T. gondii provides an important new model to study the evolutionarily conserved components of the histone code.

  15. Comparison of gene expression profiles predicting progression in breast cancer patients treated with tamoxifen.

    Science.gov (United States)

    Kok, Marleen; Linn, Sabine C; Van Laar, Ryan K; Jansen, Maurice P H M; van den Berg, Teun M; Delahaye, Leonie J M J; Glas, Annuska M; Peterse, Johannes L; Hauptmann, Michael; Foekens, John A; Klijn, Jan G M; Wessels, Lodewyk F A; Van't Veer, Laura J; Berns, Els M J J

    2009-01-01

    Molecular signatures that predict outcome in tamoxifen treated breast cancer patients have been identified. For the first time, we compared these response profiles in an independent cohort of (neo)adjuvant systemic treatment naïve breast cancer patients treated with first-line tamoxifen for metastatic disease. From a consecutive series of 246 estrogen receptor (ER) positive primary tumors, gene expression profiling was performed on available frozen tumors using 44K oligoarrays (n = 69). A 78-gene tamoxifen response profile (formerly consisting of 81 cDNA-clones), a 21-gene set (microarray-based Recurrence Score), as well as the HOXB13-IL17BR ratio (Two-Gene-Index, RT-PCR) were analyzed. Performance of signatures in relation to time to progression (TTP) was compared with standard immunohistochemical (IHC) markers: ER, progesterone receptor (PgR) and HER2. In univariate analyses, the 78-gene tamoxifen response profile, 21-gene set and HOXB13-IL17BR ratio were all significantly associated with TTP with hazard ratios of 2.2 (95% CI 1.3-3.7, P = 0.005), 2.3 (95% CI 1.3-4.0, P = 0.003) and 4.2 (95% CI 1.4-12.3, P = 0.009), respectively. The concordance among the three classifiers was relatively low, they classified only 45-61% of patients in the same category. In multivariate analyses, the association remained significant for the 78-gene profile and the 21-gene set after adjusting for ER and PgR. The 78-gene tamoxifen response profile, the 21-gene set and the HOXB13-IL17BR ratio were all significantly associated with TTP in an independent patient series treated with tamoxifen. The addition of multigene assays to ER (IHC) improves the prediction of outcome in tamoxifen treated patients and deserves incorporation in future clinical studies.

  16. Learning "graph-mer" motifs that predict gene expression trajectories in development.

    Directory of Open Access Journals (Sweden)

    Xuejing Li

    2010-04-01

    Full Text Available A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS regression to learn sequence patterns--represented by graphs of k-mers, or "graph-mers"--that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data.

  17. Prediction of Sinorhizobium meliloti sRNA genes and experimental detection in strain 2011

    Directory of Open Access Journals (Sweden)

    Becker Anke

    2008-09-01

    Full Text Available Abstract Background Small non-coding RNAs (sRNAs have emerged as ubiquitous regulatory elements in bacteria and other life domains. However, few sRNAs have been identified outside several well-studied species of gamma-proteobacteria and thus relatively little is known about the role of RNA-mediated regulation in most other bacterial genera. Here we have conducted a computational prediction of putative sRNA genes in intergenic regions (IgRs of the symbiotic α-proteobacterium S. meliloti 1021 and experimentally confirmed the expression of dozens of these candidate loci in the closely related strain S. meliloti 2011. Results Our first sRNA candidate compilation was based mainly on the output of the sRNAPredictHT algorithm. A thorough manual sequence analysis of the curated list rendered an initial set of 18 IgRs of interest, from which 14 candidates were detected in strain 2011 by Northern blot and/or microarray analysis. Interestingly, the intracellular transcript levels varied in response to various stress conditions. We developed an alternative computational method to more sensitively predict sRNA-encoding genes and score these predicted genes based on several features to allow identification of the strongest candidates. With this novel strategy, we predicted 60 chromosomal independent transcriptional units that, according to our annotation, represent strong candidates for sRNA-encoding genes, including most of the sRNAs experimentally verified in this work and in two other contemporary studies. Additionally, we predicted numerous candidate sRNA genes encoded in megaplasmids pSymA and pSymB. A significant proportion of the chromosomal- and megaplasmid-borne putative sRNA genes were validated by microarray analysis in strain 2011. Conclusion Our data extend the number of experimentally detected S. meliloti sRNAs and significantly expand the list of putative sRNA-encoding IgRs in this and closely related α-proteobacteria. In addition, we have

  18. Gene expression prediction by soft integration and the elastic net-best performance of the DREAM3 gene expression challenge.

    Directory of Open Access Journals (Sweden)

    Mika Gustafsson

    Full Text Available BACKGROUND: To predict gene expressions is an important endeavour within computational systems biology. It can both be a way to explore how drugs affect the system, as well as providing a framework for finding which genes are interrelated in a certain process. A practical problem, however, is how to assess and discriminate among the various algorithms which have been developed for this purpose. Therefore, the DREAM project invited the year 2008 to a challenge for predicting gene expression values, and here we present the algorithm with best performance. METHODOLOGY/PRINCIPAL FINDINGS: We develop an algorithm by exploring various regression schemes with different model selection procedures. It turns out that the most effective scheme is based on least squares, with a penalty term of a recently developed form called the "elastic net". Key components in the algorithm are the integration of expression data from other experimental conditions than those presented for the challenge and the utilization of transcription factor binding data for guiding the inference process towards known interactions. Of importance is also a cross-validation procedure where each form of external data is used only to the extent it increases the expected performance. CONCLUSIONS/SIGNIFICANCE: Our algorithm proves both the possibility to extract information from large-scale expression data concerning prediction of gene levels, as well as the benefits of integrating different data sources for improving the inference. We believe the former is an important message to those still hesitating on the possibilities for computational approaches, while the latter is part of an important way forward for the future development of the field of computational systems biology.

  19. ETS Gene Fusions as Predictive Biomarkers of Resistance to Radiation Therapy for Prostate Cancer

    Science.gov (United States)

    2016-05-01

    Award  Number:    W81XWH-10-1-0582 TITLE:       ETS Gene Fusions as Predictive Biomarkers of Resistance to Radiation Therapy for Prostate Cancer...5a.  CONTRACT  NUMBER   ETS Gene Fusions as Predictive Biomarkers of Resistance to Radiation Therapy for Prostate Cancer 5b.  GRANT  NUMBER   W81XWH...SUPPLEMENTARY  NOTES 14. ABSTRACT The  research  goals  of  this  grant  proposal  are  to:  1)  investigate  the  effect  of   ETS  gene  fusions  on  radiation

  20. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity.

    Directory of Open Access Journals (Sweden)

    Slavé Petrovski

    2015-09-01

    Full Text Available Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene's proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS, termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene's regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1 genes that are known to cause disease through haploinsufficiency, 2 genes curated as dosage sensitive in ClinGen's Genome Dosage Map, 3 genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4 genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding

  1. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity.

    Science.gov (United States)

    Petrovski, Slavé; Gussow, Ayal B; Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H; Allen, Andrew S; Goldstein, David B

    2015-09-01

    Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene's proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene's regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen's Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, nc

  2. Minimal gene selection for classification and diagnosis prediction based on gene expression profile

    Directory of Open Access Journals (Sweden)

    Alireza Mehridehnavi

    2013-01-01

    Conclusion: We have shown that the use of two most significant genes based on their S/N ratios and selection of suitable training samples can lead to classify DLBCL patients with a rather good result. Actually with the aid of mentioned methods we could compensate lack of enough number of patients, improve accuracy of classifying and reduce complication of computations and so running time.

  3. Protein-protein interactions prediction based on iterative clique extension with gene ontology filtering.

    Science.gov (United States)

    Yang, Lei; Tang, Xianglong

    2014-01-01

    Cliques (maximal complete subnets) in protein-protein interaction (PPI) network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO) annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP) and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  4. Protein-Protein Interactions Prediction Based on Iterative Clique Extension with Gene Ontology Filtering

    Directory of Open Access Journals (Sweden)

    Lei Yang

    2014-01-01

    Full Text Available Cliques (maximal complete subnets in protein-protein interaction (PPI network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  5. Gene expression-based classification of non-small cell lung carcinomas and survival prediction.

    Directory of Open Access Journals (Sweden)

    Jun Hou

    Full Text Available BACKGROUND: Current clinical therapy of non-small cell lung cancer depends on histo-pathological classification. This approach poorly predicts clinical outcome for individual patients. Gene expression profiling holds promise to improve clinical stratification, thus paving the way for individualized therapy. METHODOLOGY AND PRINCIPAL FINDINGS: A genome-wide gene expression analysis was performed on a cohort of 91 patients. We used 91 tumor- and 65 adjacent normal lung tissue samples. We defined sets of predictor genes (probe sets with the expression profiles. The power of predictor genes was evaluated using an independent cohort of 96 non-small cell lung cancer- and 6 normal lung samples. We identified a tumor signature of 5 genes that aggregates the 156 tumor and normal samples into the expected groups. We also identified a histology signature of 75 genes, which classifies the samples in the major histological subtypes of non-small cell lung cancer. Correlation analysis identified 17 genes which showed the best association with post-surgery survival time. This signature was used for stratification of all patients in two risk groups. Kaplan-Meier survival curves show that the two groups display a significant difference in post-surgery survival time (p = 5.6E-6. The performance of the signatures was validated using a patient cohort of similar size (Duke University, n = 96. Compared to previously published prognostic signatures for NSCLC, the 17 gene signature performed well on these two cohorts. CONCLUSIONS: The gene signatures identified are promising tools for histo-pathological classification of non-small cell lung cancer, and may improve the prediction of clinical outcome.

  6. Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data

    Science.gov (United States)

    2013-01-01

    Background High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed. Results We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature. Conclusions We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments. PMID:24053776

  7. The Choice between MapMan and Gene Ontology for Automated Gene Function Prediction in Plant Science.

    Science.gov (United States)

    Klie, Sebastian; Nikoloski, Zoran

    2012-01-01

    Since the introduction of the Gene Ontology (GO), the analysis of high-throughput data has become tightly coupled with the use of ontologies to establish associations between knowledge and data in an automated fashion. Ontologies provide a systematic description of knowledge by a controlled vocabulary of defined structure in which ontological concepts are connected by pre-defined relationships. In plant science, MapMan and GO offer two alternatives for ontology-driven analyses. Unlike GO, initially developed to characterize microbial systems, MapMan was specifically designed to cover plant-specific pathways and processes. While the dependencies between concepts in MapMan are modeled as a tree, in GO these are captured in a directed acyclic graph. Therefore, the difference in ontologies may cause discrepancies in data reduction, visualization, and hypothesis generation. Here provide the first systematic comparative analysis of GO and MapMan for the case of the model plant species Arabidopsis thaliana (Arabidopsis) with respect to their structural properties and difference in distributions of information content. In addition, we investigate the effect of the two ontologies on the specificity and sensitivity of automated gene function prediction via the coupling of co-expression networks and the guilt-by-association principle. Automated gene function prediction is particularly needed for the model plant Arabidopsis in which only half of genes have been functionally annotated based on sequence similarity to known genes. The results highlight the need for structured representation of species-specific biological knowledge, and warrants caution in the design principles employed in future ontologies.

  8. The choice between MapMan and Gene Ontology for automated gene function prediction in plant science

    Directory of Open Access Journals (Sweden)

    Sebastian eKlie

    2012-06-01

    Full Text Available Since the introduction of the Gene Ontology (GO, the analysis of high-throughput data has become tightly coupled with the use of ontologies to establish associations between knowledge and data in an automated fashion. Ontologies provide a systematic description of knowledge by a controlled vocabulary of defined structure in which ontological concepts are connected by pre-defined relationships. In plant science, MapMan and GO offer two alternatives for ontology-driven analyses. Unlike GO, initially developed to characterize microbial systems, MapMan was specifically designed to cover plant-specific pathways and processes. While the dependencies between concepts in MapMan are modeled as a tree, in GO these are captured in a directed acyclic graph. Therefore, the difference in ontologies may cause discrepancies in data reduction, visualization, and hypothesis generation. Here provide the first systematic comparative analysis of GO and MapMan for the case of the model plant species Arabidopsis thaliana (Arabidopsis with respect to their structural properties and difference in distributions of information content. In addition, we investigate the effect of the two ontologies on the specificity and sensitivity of automated gene function prediction via the coupling of coexpression networks and the guilt-by-association principle. Automated gene function prediction is particularly needed for the model plant Arabidopsis in which only half of genes have been functionally annotated based on sequence similarity to known genes. The results highlight the need for structured representation of species-specific biological knowledge, and warrants caution in the design principles employed in future ontologies.

  9. Gene panel model predictive of outcome in patients with prostate cancer.

    Science.gov (United States)

    Rabiau, Nadège; Dantal, Yann; Guy, Laurent; Ngollo, Marjolaine; Dagdemir, Aslihan; Kemeny, Jean-Louis; Terris, Benoît; Vieillefond, Annick; Boiteux, Jean-Paul; Bignon, Yves-Jean; Bernard-Gallon, Dominique

    2013-08-01

    In men at high risk for prostate cancer, established clinical and pathological parameters provide only limited prognostic information. Here we analyzed a French cohort of 103 prostate cancer patients and developed a gene panel model predictive of outcome in this group of patients. The model comprised of a 15-gene TaqMan Low-Density Array (TLDA) card, with gene expressions compared to a standardized reference. The RQ value for each gene was calculated, and a scoring system was developed. Summing all the binary scores (0 or 1) corresponding to the 15 genes, a global score is obtained between 0 and 15. This global score can be compared to Gleason score (0 to 10) by recalculating it into a 0-10 scaled score. A scaled score ≥2 suggested that the patient is suffering from a prostate cancer, and a scaled score ≥7 flagged aggressive cancer. Statistical analyses demonstrated a strongly significant linear correlation (p=3.50E-08) between scaled score and Gleason score for this prostate cancer cohort (N=103). These results support the capacity of this designed 15 target gene TLDA card approach to predict outcome in prostate cancer, opening up a new avenue for personalized medicine through future independent replication and applications for rapid identification of aggressive prostate cancer phenotypes for early intervention.

  10. HOX Gene Promoter Prediction and Inter-genomic Comparison: An Evo-Devo Study

    Directory of Open Access Journals (Sweden)

    Marla A. Endriga

    2010-10-01

    Full Text Available Homeobox genes direct the anterior-posterior axis of the body plan in eukaryotic organisms. Promoter regions upstream of the Hox genes jumpstart the transcription process. CpG islands found within the promoter regions can cause silencing of these promoters. The locations of the promoter regions and the CpG islands of Homeo sapiens sapiens (human, Pan troglodytes (chimpanzee, Mus musculus (mouse, and Rattus norvegicus (brown rat are compared and related to the possible influence on the specification of the mammalian body plan. The sequence of each gene in Hox clusters A-D of the mammals considered were retrieved from Ensembl and locations of promoter regions and CpG islands predicted using Exon Finder. The predicted promoter sequences were confirmed via BLAST and verified against the Eukaryotic Promoter Database. The significance of the locations was determined using the Kruskal-Wallis test. Among the four clusters, only promoter locations in cluster B showed significant difference. HOX B genes have been linked with the control of genes that direct the development of axial morphology, particularly of the vertebral column bones. The magnitude of variation among the body plans of closely-related species can thus be partially attributed to the promoter kind, location and number, and gene inactivation via CpG methylation.

  11. Genome-wide Transcription Factor Gene Prediction and their Expressional Tissue-Specificities in Maize

    Institute of Scientific and Technical Information of China (English)

    Yi Jiang; Biao Zeng; Hainan Zhao; Mei Zhang; Shaojun Xie; Jinsheng Lai

    2012-01-01

    Transcription factors (TFs) are important regulators of gene expression.To better understand TFencoding genes in maize (Zea mays L.),a genome-wide TF prediction was performed using the updated B73 reference genome.A total of 2 298 TF genes were identified,which can be classified into 56 families.The largest family,known as the MYB superfamily,comprises 322 MYB and MYB-related TF genes.The expression patterns of 2014 (87.64%) TF genes were examined using RNA-seq data,which resulted in the identification of a subset of TFs that are specifically expressed in particular tissues (including root,shoot,leaf,ear,tassel and kernel).Similarly,98 kernel-specific TF genes were further analyzed,and it was observed that 29 of the kernel-specific genes were preferentially expressed in the early kernel developmental stage,while 69 of the genes were expressed in the late kernel developmental stage.Identification of these TFs,particularly the tissue-specific ones,provides important information for the understanding of development and transcriptional regulation of maize.

  12. The utility and predictive value of combinations of low penetrance genes for screening and risk prediction of colorectal cancer.

    Science.gov (United States)

    Hawken, Steven J; Greenwood, Celia M T; Hudson, Thomas J; Kustra, Rafal; McLaughlin, John; Yang, Quanhe; Zanke, Brent W; Little, Julian

    2010-07-01

    Despite the fact that colorectal cancer (CRC) is a highly treatable form of cancer if detected early, a very low proportion of the eligible population undergoes screening for this form of cancer. Integrating a genomic screening profile as a component of existing screening programs for CRC could potentially improve the effectiveness of population screening by allowing the assignment of individuals to different types and intensities of screening and also by potentially increasing the uptake of existing screening programs. We evaluated the utility and predictive value of genomic profiling as applied to CRC, and as a potential component of a population-based cancer screening program. We generated simulated data representing a typical North American population including a variety of genetic profiles, with a range of relative risks and prevalences for individual risk genes. We then used these data to estimate parameters characterizing the predictive value of a logistic regression model built on genetic markers for CRC. Meta-analyses of genetic associations with CRC were used in building science to inform the simulation work, and to select genetic variants to include in logistic regression model-building using data from the ARCTIC study in Ontario, which included 1,200 CRC cases and a similar number of cancer-free population-based controls. Our simulations demonstrate that for reasonable assumptions involving modest relative risks for individual genetic variants, that substantial predictive power can be achieved when risk variants are common (e.g., prevalence > 20%) and data for enough risk variants are available (e.g., approximately 140-160). Pilot work in population data shows modest, but statistically significant predictive utility for a small collection of risk variants, smaller in effect than age and gender alone in predicting an individual's CRC risk. Further genotyping and many more samples will be required, and indeed the discovery of many more risk loci

  13. Convergence of mutation and epigenetic alterations identifies common genes in cancer that predict for poor prognosis.

    Directory of Open Access Journals (Sweden)

    Timothy A Chan

    2008-05-01

    -wide approach, our analysis has enabled the discovery of a number of clinically significant genes targeted by multiple modes of inactivation in breast and colon cancer. Importantly, we demonstrate that a subset of these genes predict strongly for poor clinical outcome. Our data define a set of genes that are targeted by both genetic and epigenetic events, predict for clinical prognosis, and are likely fundamentally important for cancer initiation or progression.

  14. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction

    Directory of Open Access Journals (Sweden)

    Kohlbacher Oliver

    2009-09-01

    Full Text Available Abstract Background Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. Results We extended our previous MultiLoc predictor by incorporating phylogenetic profiles and Gene Ontology terms. Two different datasets were used for training the system, resulting in two versions of this high-accuracy prediction method. One version is specialized for globular proteins and predicts up to five localizations, whereas a second version covers all eleven main eukaryotic subcellular localizations. In a benchmark study with five localizations, MultiLoc2 performs considerably better than other methods for animal and plant proteins and comparably for fungal proteins. Furthermore, MultiLoc2 performs clearly better when using a second dataset that extends the benchmark study to all eleven main eukaryotic subcellular localizations. Conclusion MultiLoc2 is an extensive high-performance subcellular protein localization prediction system. By incorporating phylogenetic profiles and Gene Ontology terms MultiLoc2 yields higher accuracies compared to its previous version. Moreover, it outperforms other prediction systems in two benchmarks studies. MultiLoc2 is available as user-friendly and free web-service, available at: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc2.

  15. Predicting the size of the progeny mapping population required to positionally clone a gene.

    Science.gov (United States)

    Dinka, Stephen J; Campbell, Matthew A; Demers, Tyler; Raizada, Manish N

    2007-08-01

    A key frustration during positional gene cloning (map-based cloning) is that the size of the progeny mapping population is difficult to predict, because the meiotic recombination frequency varies along chromosomes. We describe a detailed methodology to improve this prediction using rice (Oryza sativa L.) as a model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping population size by comparing these theoretical estimates to 41 successful positional cloning attempts. We then used each validated equation to test whether neighborhood meiotic recombination frequencies extracted from a reference RFLP map can help researchers predict the mapping population size. We developed a meiotic recombination frequency map (MRFM) for approximately 1400 marker intervals in rice and anchored each published allele onto an interval on this map. We show that neighborhood recombination frequencies (R-map, >280-kb segments) extracted from the MRFM, in conjunction with the validated formulas, better predicted the mapping population size than the genome-wide average recombination frequency (R-avg), with improved results whether the recombination frequency was calculated as genes/cM or kb/cM. Our results offer a detailed road map for better predicting mapping population size in diverse eukaryotes, but useful predictions will require robust recombination frequency maps based on sampling more progeny.

  16. Refining ensembles of predicted gene regulatory networks based on characteristic interaction sets.

    Directory of Open Access Journals (Sweden)

    Lukas Windhager

    Full Text Available Different ensemble voting approaches have been successfully applied for reverse-engineering of gene regulatory networks. They are based on the assumption that a good approximation of true network structure can be derived by considering the frequencies of individual interactions in a large number of predicted networks. Such approximations are typically superior in terms of prediction quality and robustness as compared to considering a single best scoring network only. Nevertheless, ensemble approaches only work well if the predicted gene regulatory networks are sufficiently similar to each other. If the topologies of predicted networks are considerably different, an ensemble of all networks obscures interesting individual characteristics. Instead, networks should be grouped according to local topological similarities and ensemble voting performed for each group separately. We argue that the presence of sets of co-occurring interactions is a suitable indicator for grouping predicted networks. A stepwise bottom-up procedure is proposed, where first mutual dependencies between pairs of interactions are derived from predicted networks. Pairs of co-occurring interactions are subsequently extended to derive characteristic interaction sets that distinguish groups of networks. Finally, ensemble voting is applied separately to the resulting topologically similar groups of networks to create distinct group-ensembles. Ensembles of topologically similar networks constitute distinct hypotheses about the reference network structure. Such group-ensembles are easier to interpret as their characteristic topology becomes clear and dependencies between interactions are known. The availability of distinct hypotheses facilitates the design of further experiments to distinguish between plausible network structures. The proposed procedure is a reasonable refinement step for non-deterministic reverse-engineering applications that produce a large number of candidate

  17. Clustering Gene Expression Data Based on Predicted Differential Effects of G V Interaction

    Institute of Scientific and Technical Information of China (English)

    Hai-Yan Pan; Jun Zhu; Dan-Fu Han

    2005-01-01

    Microarray has become a popular biotechnology in biological and medical research.However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of G V (gene by variety)interaction using the adjusted unbiased prediction (AUP) method. The predicted G V interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.

  18. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction.

    Science.gov (United States)

    Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H

    2017-01-09

    The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively.

  19. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction

    Science.gov (United States)

    Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K.; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G.; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H.

    2017-01-01

    The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. PMID:27899623

  20. Netter: re-ranking gene network inference predictions using structural network properties.

    Science.gov (United States)

    Ruyssinck, Joeri; Demeester, Piet; Dhaene, Tom; Saeys, Yvan

    2016-02-09

    Many algorithms have been developed to infer the topology of gene regulatory networks from gene expression data. These methods typically produce a ranking of links between genes with associated confidence scores, after which a certain threshold is chosen to produce the inferred topology. However, the structural properties of the predicted network do not resemble those typical for a gene regulatory network, as most algorithms only take into account connections found in the data and do not include known graph properties in their inference process. This lowers the prediction accuracy of these methods, limiting their usability in practice. We propose a post-processing algorithm which is applicable to any confidence ranking of regulatory interactions obtained from a network inference method which can use, inter alia, graphlets and several graph-invariant properties to re-rank the links into a more accurate prediction. To demonstrate the potential of our approach, we re-rank predictions of six different state-of-the-art algorithms using three simple network properties as optimization criteria and show that Netter can improve the predictions made on both artificially generated data as well as the DREAM4 and DREAM5 benchmarks. Additionally, the DREAM5 E.coli. community prediction inferred from real expression data is further improved. Furthermore, Netter compares favorably to other post-processing algorithms and is not restricted to correlation-like predictions. Lastly, we demonstrate that the performance increase is robust for a wide range of parameter settings. Netter is available at http://bioinformatics.intec.ugent.be. Network inference from high-throughput data is a long-standing challenge. In this work, we present Netter, which can further refine network predictions based on a set of user-defined graph properties. Netter is a flexible system which can be applied in unison with any method producing a ranking from omics data. It can be tailored to specific prior

  1. Predictive gene signatures: molecular markers distinguishing colon adenomatous polyp and carcinoma.

    Science.gov (United States)

    Drew, Janice E; Farquharson, Andrew J; Mayer, Claus Dieter; Vase, Hollie F; Coates, Philip J; Steele, Robert J; Carey, Francis A

    2014-01-01

    Cancers exhibit abnormal molecular signatures associated with disease initiation and progression. Molecular signatures could improve cancer screening, detection, drug development and selection of appropriate drug therapies for individual patients. Typically only very small amounts of tissue are available from patients for analysis and biopsy samples exhibit broad heterogeneity that cannot be captured using a single marker. This report details application of an in-house custom designed GenomeLab System multiplex gene expression assay, the hCellMarkerPlex, to assess predictive gene signatures of normal, adenomatous polyp and carcinoma colon tissue using archived tissue bank material. The hCellMarkerPlex incorporates twenty-one gene markers: epithelial (EZR, KRT18, NOX1, SLC9A2), proliferation (PCNA, CCND1, MS4A12), differentiation (B4GANLT2, CDX1, CDX2), apoptotic (CASP3, NOX1, NTN1), fibroblast (FSP1, COL1A1), structural (ACTG2, CNN1, DES), gene transcription (HDAC1), stem cell (LGR5), endothelial (VWF) and mucin production (MUC2). Gene signatures distinguished normal, adenomatous polyp and carcinoma. Individual gene targets significantly contributing to molecular tissue types, classifier genes, were further characterised using real-time PCR, in-situ hybridisation and immunohistochemistry revealing aberrant epithelial expression of MS4A12, LGR5 CDX2, NOX1 and SLC9A2 prior to development of carcinoma. Identified gene signatures identify aberrant epithelial expression of genes prior to cancer development using in-house custom designed gene expression multiplex assays. This approach may be used to assist in objective classification of disease initiation, staging, progression and therapeutic responses using biopsy material.

  2. Predictive gene signatures: molecular markers distinguishing colon adenomatous polyp and carcinoma.

    Directory of Open Access Journals (Sweden)

    Janice E Drew

    Full Text Available Cancers exhibit abnormal molecular signatures associated with disease initiation and progression. Molecular signatures could improve cancer screening, detection, drug development and selection of appropriate drug therapies for individual patients. Typically only very small amounts of tissue are available from patients for analysis and biopsy samples exhibit broad heterogeneity that cannot be captured using a single marker. This report details application of an in-house custom designed GenomeLab System multiplex gene expression assay, the hCellMarkerPlex, to assess predictive gene signatures of normal, adenomatous polyp and carcinoma colon tissue using archived tissue bank material. The hCellMarkerPlex incorporates twenty-one gene markers: epithelial (EZR, KRT18, NOX1, SLC9A2, proliferation (PCNA, CCND1, MS4A12, differentiation (B4GANLT2, CDX1, CDX2, apoptotic (CASP3, NOX1, NTN1, fibroblast (FSP1, COL1A1, structural (ACTG2, CNN1, DES, gene transcription (HDAC1, stem cell (LGR5, endothelial (VWF and mucin production (MUC2. Gene signatures distinguished normal, adenomatous polyp and carcinoma. Individual gene targets significantly contributing to molecular tissue types, classifier genes, were further characterised using real-time PCR, in-situ hybridisation and immunohistochemistry revealing aberrant epithelial expression of MS4A12, LGR5 CDX2, NOX1 and SLC9A2 prior to development of carcinoma. Identified gene signatures identify aberrant epithelial expression of genes prior to cancer development using in-house custom designed gene expression multiplex assays. This approach may be used to assist in objective classification of disease initiation, staging, progression and therapeutic responses using biopsy material.

  3. Predictive gene lists for breast cancer prognosis: A topographic visualisation study

    Directory of Open Access Journals (Sweden)

    Lowe David

    2008-04-01

    Full Text Available Abstract Background The controversy surrounding the non-uniqueness of predictive gene lists (PGL of small selected subsets of genes from very large potential candidates as available in DNA microarray experiments is now widely acknowledged 1. Many of these studies have focused on constructing discriminative semi-parametric models and as such are also subject to the issue of random correlations of sparse model selection in high dimensional spaces. In this work we outline a different approach based around an unsupervised patient-specific nonlinear topographic projection in predictive gene lists. Methods We construct nonlinear topographic projection maps based on inter-patient gene-list relative dissimilarities. The Neuroscale, the Stochastic Neighbor Embedding(SNE and the Locally Linear Embedding(LLE techniques have been used to construct two-dimensional projective visualisation plots of 70 dimensional PGLs per patient, classifiers are also constructed to identify the prognosis indicator of each patient using the resulting projections from those visualisation techniques and investigate whether a-posteriori two prognosis groups are separable on the evidence of the gene lists. A literature-proposed predictive gene list for breast cancer is benchmarked against a separate gene list using the above methods. Generalisation ability is investigated by using the mapping capability of Neuroscale to visualise the follow-up study, but based on the projections derived from the original dataset. Results The results indicate that small subsets of patient-specific PGLs have insufficient prognostic dissimilarity to permit a distinction between two prognosis patients. Uncertainty and diversity across multiple gene expressions prevents unambiguous or even confident patient grouping. Comparative projections across different PGLs provide similar results. Conclusion The random correlation effect to an arbitrary outcome induced by small subset selection from very high

  4. A sputum gene expression signature predicts oral corticosteroid response in asthma.

    Science.gov (United States)

    Berthon, Bronwyn S; Gibson, Peter G; Wood, Lisa G; MacDonald-Wicks, Lesley K; Baines, Katherine J

    2017-06-01

    Biomarkers that predict responses to oral corticosteroids (OCS) facilitate patient selection for asthma treatment. We hypothesised that asthma patients would respond differently to OCS therapy, with biomarkers and inflammometry predicting response.Adults with stable asthma underwent a randomised controlled cross-over trial of 50 mg prednisolone daily for 10 days (n=55). A six-gene expression biomarker signature (CLC, CPA3, DNASE1L3, IL1B, ALPL and CXCR2) in induced sputum, and eosinophils in blood and sputum were assessed and predictors of response were investigated (changes in forced expiratory volume in 1 s (ΔFEV1), six-item Asthma Control Questionnaire score (ΔACQ6) or exhaled nitric oxide fraction (ΔFeNO)).At baseline, responders to OCS (n=25) had upregulated mast cell CPA3 gene expression, poorer lung function, and higher sputum and blood eosinophils. Following treatment, CLC and CPA3 gene expression was reduced, whereas DNASE1L3, IL1B, ALPL and CXCR2 expression remained unchanged. Receiver operating characteristic (ROC) analysis showed the six-gene expression biomarker signature as a better predictor of clinically significant responses to OCS than blood and sputum eosinophils.The six-gene expression signature including eosinophil and Th2 related mast cell biomarkers showed greater precision in predicting OCS response in stable asthma. Thus, a novel sputum gene expression signature highlights an additional role of mast cells in asthma, and could be a useful measurement to guide OCS therapy in asthma. Copyright ©ERS 2017.

  5. In vitro gene regulatory networks predict in vivo function of liver

    Directory of Open Access Journals (Sweden)

    Ang Choo Y

    2010-11-01

    Full Text Available Abstract Background Evolution of toxicity testing is predicated upon using in vitro cell based systems to rapidly screen and predict how a chemical might cause toxicity to an organ in vivo. However, the degree to which we can extend in vitro results to in vivo activity and possible mechanisms of action remains to be fully addressed. Results Here we use the nitroaromatic 2,4,6-trinitrotoluene (TNT as a model chemical to compare and determine how we might extrapolate from in vitro data to in vivo effects. We found 341 transcripts differentially expressed in common among in vitro and in vivo assays in response to TNT. The major functional term corresponding to these transcripts was cell cycle. Similarly modulated common pathways were identified between in vitro and in vivo. Furthermore, we uncovered the conserved common transcriptional gene regulatory networks between in vitro and in vivo cellular liver systems that responded to TNT exposure, which mainly contain 2 subnetwork modules: PTTG1 and PIR centered networks. Interestingly, all 7 genes in the PTTG1 module were involved in cell cycle and downregulated by TNT both in vitro and in vivo. Conclusions The results of our investigation of TNT effects on gene expression in liver suggest that gene regulatory networks obtained from an in vitro system can predict in vivo function and mechanisms. Inhibiting PTTG1 and its targeted cell cyle related genes could be key machanism for TNT induced liver toxicity.

  6. In vitro gene regulatory networks predict in vivo function of liver

    Science.gov (United States)

    2010-01-01

    Background Evolution of toxicity testing is predicated upon using in vitro cell based systems to rapidly screen and predict how a chemical might cause toxicity to an organ in vivo. However, the degree to which we can extend in vitro results to in vivo activity and possible mechanisms of action remains to be fully addressed. Results Here we use the nitroaromatic 2,4,6-trinitrotoluene (TNT) as a model chemical to compare and determine how we might extrapolate from in vitro data to in vivo effects. We found 341 transcripts differentially expressed in common among in vitro and in vivo assays in response to TNT. The major functional term corresponding to these transcripts was cell cycle. Similarly modulated common pathways were identified between in vitro and in vivo. Furthermore, we uncovered the conserved common transcriptional gene regulatory networks between in vitro and in vivo cellular liver systems that responded to TNT exposure, which mainly contain 2 subnetwork modules: PTTG1 and PIR centered networks. Interestingly, all 7 genes in the PTTG1 module were involved in cell cycle and downregulated by TNT both in vitro and in vivo. Conclusions The results of our investigation of TNT effects on gene expression in liver suggest that gene regulatory networks obtained from an in vitro system can predict in vivo function and mechanisms. Inhibiting PTTG1 and its targeted cell cyle related genes could be key machanism for TNT induced liver toxicity. PMID:21073692

  7. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    Science.gov (United States)

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of EOperon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system.

  8. A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data

    Directory of Open Access Journals (Sweden)

    Ruzzo Walter L

    2006-03-01

    Full Text Available Abstract Background As a variety of functional genomic and proteomic techniques become available, there is an increasing need for functional analysis methodologies that integrate heterogeneous data sources. Methods In this paper, we address this issue by proposing a general framework for gene function prediction based on the k-nearest-neighbor (KNN algorithm. The choice of KNN is motivated by its simplicity, flexibility to incorporate different data types and adaptability to irregular feature spaces. A weakness of traditional KNN methods, especially when handling heterogeneous data, is that performance is subject to the often ad hoc choice of similarity metric. To address this weakness, we apply regression methods to infer a similarity metric as a weighted combination of a set of base similarity measures, which helps to locate the neighbors that are most likely to be in the same class as the target gene. We also suggest a novel voting scheme to generate confidence scores that estimate the accuracy of predictions. The method gracefully extends to multi-way classification problems. Results We apply this technique to gene function prediction according to three well-known Escherichia coli classification schemes suggested by biologists, using information derived from microarray and genome sequencing data. We demonstrate that our algorithm dramatically outperforms the naive KNN methods and is competitive with support vector machine (SVM algorithms for integrating heterogenous data. We also show that by combining different data sources, prediction accuracy can improve significantly. Conclusion Our extension of KNN with automatic feature weighting, multi-class prediction, and probabilistic inference, enhance prediction accuracy significantly while remaining efficient, intuitive and flexible. This general framework can also be applied to similar classification problems involving heterogeneous datasets.

  9. A regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data.

    Science.gov (United States)

    Yao, Zizhen; Ruzzo, Walter L

    2006-03-20

    As a variety of functional genomic and proteomic techniques become available, there is an increasing need for functional analysis methodologies that integrate heterogeneous data sources. In this paper, we address this issue by proposing a general framework for gene function prediction based on the k-nearest-neighbor (KNN) algorithm. The choice of KNN is motivated by its simplicity, flexibility to incorporate different data types and adaptability to irregular feature spaces. A weakness of traditional KNN methods, especially when handling heterogeneous data, is that performance is subject to the often ad hoc choice of similarity metric. To address this weakness, we apply regression methods to infer a similarity metric as a weighted combination of a set of base similarity measures, which helps to locate the neighbors that are most likely to be in the same class as the target gene. We also suggest a novel voting scheme to generate confidence scores that estimate the accuracy of predictions. The method gracefully extends to multi-way classification problems. We apply this technique to gene function prediction according to three well-known Escherichia coli classification schemes suggested by biologists, using information derived from microarray and genome sequencing data. We demonstrate that our algorithm dramatically outperforms the naive KNN methods and is competitive with support vector machine (SVM) algorithms for integrating heterogenous data. We also show that by combining different data sources, prediction accuracy can improve significantly Our extension of KNN with automatic feature weighting, multi-class prediction, and probabilistic inference, enhance prediction accuracy significantly while remaining efficient, intuitive and flexible. This general framework can also be applied to similar classification problems involving heterogeneous datasets.

  10. Genomic prediction contributing to a promising global strategy to turbocharge gene banks.

    Science.gov (United States)

    Yu, Xiaoqing; Li, Xianran; Guo, Tingting; Zhu, Chengsong; Wu, Yuye; Mitchell, Sharon E; Roozeboom, Kraig L; Wang, Donghai; Wang, Ming Li; Pederson, Gary A; Tesso, Tesfaye T; Schnable, Patrick S; Bernardo, Rex; Yu, Jianming

    2016-10-03

    The 7.4 million plant accessions in gene banks are largely underutilized due to various resource constraints, but current genomic and analytic technologies are enabling us to mine this natural heritage. Here we report a proof-of-concept study to integrate genomic prediction into a broad germplasm evaluation process. First, a set of 962 biomass sorghum accessions were chosen as a reference set by germplasm curators. With high throughput genotyping-by-sequencing (GBS), we genetically characterized this reference set with 340,496 single nucleotide polymorphisms (SNPs). A set of 299 accessions was selected as the training set to represent the overall diversity of the reference set, and we phenotypically characterized the training set for biomass yield and other related traits. Cross-validation with multiple analytical methods using the data of this training set indicated high prediction accuracy for biomass yield. Empirical experiments with a 200-accession validation set chosen from the reference set confirmed high prediction accuracy. The potential to apply the prediction model to broader genetic contexts was also examined with an independent population. Detailed analyses on prediction reliability provided new insights into strategy optimization. The success of this project illustrates that a global, cost-effective strategy may be designed to assess the vast amount of valuable germplasm archived in 1,750 gene banks.

  11. Integrative Analysis of Gene Expression Data Including an Assessment of Pathway Enrichment for Predicting Prostate Cancer

    Directory of Open Access Journals (Sweden)

    Pingzhao Hu

    2006-01-01

    biological pathways. In particular, we observed that by integrating information from the insulin signalling pathway into our prediction model, we achieved better prediction of prostate cancer. Conclusions: Our data integration methodology provides an efficient way to identify biologically sound and statistically significant pathways from gene expression data. The significant gene expression phenotypes identified in our study have the potential to characterize complex genetic alterations in prostate cancer.

  12. Gene expression profiling to predict the risk of locoregional recurrence in breast cancer: a pooled analysis.

    Science.gov (United States)

    Drukker, C A; Elias, S G; Nijenhuis, M V; Wesseling, J; Bartelink, H; Elkhuizen, P; Fowble, B; Whitworth, P W; Patel, R R; de Snoo, F A; van 't Veer, L J; Beitsch, P D; Rutgers, E J Th

    2014-12-01

    The 70-gene signature (MammaPrint) has been developed to predict the risk of distant metastases in breast cancer and select those patients who may benefit from adjuvant treatment. Given the strong association between locoregional and distant recurrence, we hypothesize that the 70-gene signature will also be able to predict the risk of locoregional recurrence (LRR). 1,053 breast cancer patients primarily treated with breast-conserving treatment or mastectomy at the Netherlands Cancer Institute between 1984 and 2006 were included. Adjuvant treatment consisted of radiotherapy, chemotherapy, and/or endocrine therapy as indicated by guidelines used at the time. All patients were included in various 70-gene signature validation studies. After a median follow-up of 8.96 years with 87 LRRs, patients with a high-risk 70-gene signature (n = 492) had an LRR risk of 12.6% (95% CI 9.7-15.8) at 10 years, compared to 6.1% (95% CI 4.1-8.5) for low-risk patients (n = 561; P risk model for the clinicopathological factors such as age, tumour size, grade, hormone receptor status, LVI, axillary lymph node involvement, surgical treatment, endocrine treatment, and chemotherapy resulted in a multivariable HR of 1.73 (95% CI 1.02-2.93; P = 0.042). Adding the signature to the model based on clinicopathological factors improved the discrimination, albeit non-significantly [C-index through 10 years changed from 0.731 (95% CI 0.682-0.782) to 0.741 (95% CI 0.693-0.790)]. Calibration of the prognostic models was excellent. The 70-gene signature is an independent prognostic factor for LRR. A significantly lower local recurrence risk was seen in patients with a low-risk 70-gene signature compared to those with high-risk 70-gene signature.

  13. A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen.

    Science.gov (United States)

    Ma, Xiao-Jun; Wang, Zuncai; Ryan, Paula D; Isakoff, Steven J; Barmettler, Anne; Fuller, Andrew; Muir, Beth; Mohapatra, Gayatry; Salunga, Ranelle; Tuggle, J Todd; Tran, Yen; Tran, Diem; Tassin, Ana; Amon, Paul; Wang, Wilson; Wang, Wei; Enright, Edward; Stecker, Kimberly; Estepa-Sabal, Eden; Smith, Barbara; Younger, Jerry; Balis, Ulysses; Michaelson, James; Bhan, Atul; Habin, Karleen; Baer, Thomas M; Brugge, Joan; Haber, Daniel A; Erlander, Mark G; Sgroi, Dennis C

    2004-06-01

    Tamoxifen significantly reduces tumor recurrence in certain patients with early-stage estrogen receptor-positive breast cancer, but markers predictive of treatment failure have not been identified. Here, we generated gene expression profiles of hormone receptor-positive primary breast cancers in a set of 60 patients treated with adjuvant tamoxifen monotherapy. An expression signature predictive of disease-free survival was reduced to a two-gene ratio, HOXB13 versus IL17BR, which outperformed existing biomarkers. Ectopic expression of HOXB13 in MCF10A breast epithelial cells enhances motility and invasion in vitro, and its expression is increased in both preinvasive and invasive primary breast cancer. The HOXB13:IL17BR expression ratio may be useful for identifying patients appropriate for alternative therapeutic regimens in early-stage breast cancer.

  14. A bioinformatics tool for linking gene expression profiling results with public databases of microRNA target predictions

    Science.gov (United States)

    Creighton, Chad J.; Nagaraja, Ankur K.; Hanash, Samir M.; Matzuk, Martin M.; Gunaratne, Preethi H.

    2008-01-01

    MicroRNAs are short (∼22 nucleotides) noncoding RNAs that regulate the stability and translation of mRNA targets. A number of computational algorithms have been developed to help predict which microRNAs are likely to regulate which genes. Gene expression profiling of biological systems where microRNAs might be active can yield hundreds of differentially expressed genes. The commonly used public microRNA target prediction databases facilitate gene-by-gene searches. However, integration of microRNA–mRNA target predictions with gene expression data on a large scale using these databases is currently cumbersome and time consuming for many researchers. We have developed a desktop software application which, for a given target prediction database, retrieves all microRNA:mRNA functional pairs represented by an experimentally derived set of genes. Furthermore, for each microRNA, the software computes an enrichment statistic for overrepresentation of predicted targets within the gene set, which could help to implicate roles for specific microRNAs and microRNA-regulated genes in the system under study. Currently, the software supports searching of results from PicTar, TargetScan, and miRanda algorithms. In addition, the software can accept any user-defined set of gene-to-class associations for searching, which can include the results of other target prediction algorithms, as well as gene annotation or gene-to-pathway associations. A search (using our software) of genes transcriptionally regulated in vitro by estrogen in breast cancer uncovered numerous targeting associations for specific microRNAs—above what could be observed in randomly generated gene lists—suggesting a role for microRNAs in mediating the estrogen response. The software and Excel VBA source code are freely available at http://sigterms.sourceforge.net. PMID:18812437

  15. A bioinformatics tool for linking gene expression profiling results with public databases of microRNA target predictions.

    Science.gov (United States)

    Creighton, Chad J; Nagaraja, Ankur K; Hanash, Samir M; Matzuk, Martin M; Gunaratne, Preethi H

    2008-11-01

    MicroRNAs are short (approximately 22 nucleotides) noncoding RNAs that regulate the stability and translation of mRNA targets. A number of computational algorithms have been developed to help predict which microRNAs are likely to regulate which genes. Gene expression profiling of biological systems where microRNAs might be active can yield hundreds of differentially expressed genes. The commonly used public microRNA target prediction databases facilitate gene-by-gene searches. However, integration of microRNA-mRNA target predictions with gene expression data on a large scale using these databases is currently cumbersome and time consuming for many researchers. We have developed a desktop software application which, for a given target prediction database, retrieves all microRNA:mRNA functional pairs represented by an experimentally derived set of genes. Furthermore, for each microRNA, the software computes an enrichment statistic for overrepresentation of predicted targets within the gene set, which could help to implicate roles for specific microRNAs and microRNA-regulated genes in the system under study. Currently, the software supports searching of results from PicTar, TargetScan, and miRanda algorithms. In addition, the software can accept any user-defined set of gene-to-class associations for searching, which can include the results of other target prediction algorithms, as well as gene annotation or gene-to-pathway associations. A search (using our software) of genes transcriptionally regulated in vitro by estrogen in breast cancer uncovered numerous targeting associations for specific microRNAs-above what could be observed in randomly generated gene lists-suggesting a role for microRNAs in mediating the estrogen response. The software and Excel VBA source code are freely available at http://sigterms.sourceforge.net.

  16. Dynamics of the Transcriptome during Human Spermatogenesis: Predicting the Potential Key Genes Regulating Male Gametes Generation.

    Science.gov (United States)

    Zhu, Zijue; Li, Chong; Yang, Shi; Tian, Ruhui; Wang, Junlong; Yuan, Qingqing; Dong, Hui; He, Zuping; Wang, Shengyue; Li, Zheng

    2016-01-12

    Many infertile men are the victims of spermatogenesis disorder. However, conventional clinical test could not provide efficient information on the causes of spermatogenesis disorder and guide the doctor how to treat it. More effective diagnosis and treating methods could be developed if the key genes that regulate spermatogenesis were determined. Many works have been done on animal models, while there are few works on human beings due to the limited sample resources. In current work, testis tissues were obtained from 27 patients with obstructive azoospermia via surgery. The combination of Fluorescence Activated Cell Sorting and Magnetic Activated Cell Sorting was chosen as the efficient method to sort typical germ cells during spermatogenesis. RNA Sequencing was carried out to screen the change of transcriptomic profile of the germ cells during spermatogenesis. Differential expressed genes were clustered according to their expression patterns. Gene Ontology annotation, pathway analysis, and Gene Set Enrichment Analysis were carried out on genes with specific expression patterns and the potential key genes such as HOXs, JUN, SP1, and TCF3 which were involved in the regulation of spermatogenesis, with the potential value serve as molecular tools for clinical purpose, were predicted.

  17. Genome-wide prediction of transcriptional regulatory elements of human promoters using gene expression and promoter analysis data

    Directory of Open Access Journals (Sweden)

    Kim Seon-Young

    2006-07-01

    Full Text Available Abstract Background A complete understanding of the regulatory mechanisms of gene expression is the next important issue of genomics. Many bioinformaticians have developed methods and algorithms for predicting transcriptional regulatory mechanisms from sequence, gene expression, and binding data. However, most of these studies involved the use of yeast which has much simpler regulatory networks than human and has many genome wide binding data and gene expression data under diverse conditions. Studies of genome wide transcriptional networks of human genomes currently lag behind those of yeast. Results We report herein a new method that combines gene expression data analysis with promoter analysis to infer transcriptional regulatory elements of human genes. The Z scores from the application of gene set analysis with gene sets of transcription factor binding sites (TFBSs were successfully used to represent the activity of TFBSs in a given microarray data set. A significant correlation between the Z scores of gene sets of TFBSs and individual genes across multiple conditions permitted successful identification of many known human transcriptional regulatory elements of genes as well as the prediction of numerous putative TFBSs of many genes which will constitute a good starting point for further experiments. Using Z scores of gene sets of TFBSs produced better predictions than the use of mRNA levels of a transcription factor itself, suggesting that the Z scores of gene sets of TFBSs better represent diverse mechanisms for changing the activity of transcription factors in the cell. In addition, cis-regulatory modules, combinations of co-acting TFBSs, were readily identified by our analysis. Conclusion By a strategic combination of gene set level analysis of gene expression data sets and promoter analysis, we were able to identify and predict many transcriptional regulatory elements of human genes. We conclude that this approach will aid in decoding

  18. antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification

    DEFF Research Database (Denmark)

    Blin, Kai; Wolf, Thomas; Chevrette, Marc G.

    2017-01-01

    Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding the product......Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding...... the production of such compounds. Since 2011, the 'antibiotics and secondary metabolite analysis shell-antiSMASH' has assisted researchers in efficiently performing this, both as a web server and a standalone tool. Here, we present the thoroughly updated antiSMASH version 4, which adds several novel features......, including prediction of gene cluster boundaries using the ClusterFinder method or the newly integrated CASSIS algorithm, improved substrate specificity prediction for non-ribosomal peptide synthetase adenylation domains based on the new SANDPUMA algorithm, improved predictions for terpene and ribosomally...

  19. Improvements to previous algorithms to predict gene structure and isoform concentrations using Affymetrix Exon arrays

    Directory of Open Access Journals (Sweden)

    Aramburu Ander

    2010-11-01

    Full Text Available Abstract Background Exon arrays provide a way to measure the expression of different isoforms of genes in an organism. Most of the procedures to deal with these arrays are focused on gene expression or on exon expression. Although the only biological analytes that can be properly assigned a concentration are transcripts, there are very few algorithms that focus on them. The reason is that previously developed summarization methods do not work well if applied to transcripts. In addition, gene structure prediction, i.e., the correspondence between probes and novel isoforms, is a field which is still unexplored. Results We have modified and adapted a previous algorithm to take advantage of the special characteristics of the Affymetrix exon arrays. The structure and concentration of transcripts -some of them possibly unknown- in microarray experiments were predicted using this algorithm. Simulations showed that the suggested modifications improved both specificity (SP and sensitivity (ST of the predictions. The algorithm was also applied to different real datasets showing its effectiveness and the concordance with PCR validated results. Conclusions The proposed algorithm shows a substantial improvement in the performance over the previous version. This improvement is mainly due to the exploitation of the redundancy of the Affymetrix exon arrays. An R-Package of SPACE with the updated algorithms have been developed and is freely available.

  20. Development and Validation of Predictive Indices for a Continuous Outcome Using Gene Expression Profiles

    Directory of Open Access Journals (Sweden)

    Yingdong Zhao

    2010-05-01

    Full Text Available There have been relatively few publications using linear regression models to predict a continuous response based on microarray expression profiles. Standard linear regression methods are problematic when the number of predictor variables exceeds the number of cases. We have evaluated three linear regression algorithms that can be used for the prediction of a continuous response based on high dimensional gene expression data. The three algorithms are the least angle regression (LAR, the least absolute shrinkage and selection operator (LASSO, and the averaged linear regression method (ALM. All methods are tested using simulations based on a real gene expression dataset and analyses of two sets of real gene expression data and using an unbiased complete cross validation approach. Our results show that the LASSO algorithm often provides a model with somewhat lower prediction error than the LAR method, but both of them perform more efficiently than the ALM predictor. We have developed a plug-in for BRB-ArrayTools that implements the LAR and the LASSO algorithms with complete cross-validation.

  1. Semi-supervised prediction of gene regulatory networks using machine learning algorithms

    Indian Academy of Sciences (India)

    Nihir Patel; T L Wang

    2015-10-01

    Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.

  2. Use of tiling array data and RNA secondary structure predictions to identify noncoding RNA genes

    DEFF Research Database (Denmark)

    Weile, Christian; Gardner, Paul P; Hedegaard, Mads M

    2007-01-01

    BACKGROUND: Within the last decade a large number of noncoding RNA genes have been identified, but this may only be the tip of the iceberg. Using comparative genomics a large number of sequences that have signals concordant with conserved RNA secondary structures have been discovered in the human...... genome. Moreover, genome wide transcription profiling with tiling arrays indicate that the majority of the genome is transcribed. RESULTS: We have combined tiling array data with genome wide structural RNA predictions to search for novel noncoding and structural RNA genes that are expressed in the human...... of 3 of the hairpin structures and 3 out of 9 high covariance structures in SK-N-AS cells. CONCLUSION: Our results demonstrate that many human noncoding, structured and conserved RNA genes remain to be discovered and that tissue specific tiling array data can be used in combination with computational...

  3. Gene Expression-Based Survival Prediction in Lung Adenocarcinoma: A Multi-Site, Blinded Validation Study

    Science.gov (United States)

    Shedden, Kerby; Taylor, Jeremy M.G.; Enkemann, Steve A.; Tsao, Ming S.; Yeatman, Timothy J.; Gerald, William L.; Eschrich, Steve; Jurisica, Igor; Venkatraman, Seshan E.; Meyerson, Matthew; Kuick, Rork; Dobbin, Kevin K.; Lively, Tracy; Jacobson, James W.; Beer, David G.; Giordano, Thomas J.; Misek, David E.; Chang, Andrew C.; Zhu, Chang Qi; Strumpf, Dan; Hanash, Samir; Shepherd, Francis A.; Ding, Kuyue; Seymour, Lesley; Naoki, Katsuhiko; Pennell, Nathan; Weir, Barbara; Verhaak, Roel; Ladd-Acosta, Christine; Golub, Todd; Gruidl, Mike; Szoke, Janos; Zakowski, Maureen; Rusch, Valerie; Kris, Mark; Viale, Agnes; Motoi, Noriko; Travis, William; Sharma, Anupama

    2009-01-01

    Although prognostic gene expression signatures for survival in early stage lung cancer have been proposed, for clinical application it is critical to establish their performance across different subject populations and in different laboratories. Here we report a large, training-testing, multi-site blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The hypotheses proposed examined whether microarray measurements of gene expression either alone or combined with basic clinical covariates (stage, age, sex) can be used to predict overall survival in lung cancer subjects. Several models examined produced risk scores that substantially correlated with actual subject outcome. Most methods performed better with clinical data, supporting the combined use of clinical and molecular information when building prognostic models for early stage lung cancer. This study also provides the largest available set of microarray data with extensive pathological and clinical annotation for lung adenocarcinomas. PMID:18641660

  4. Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors

    Directory of Open Access Journals (Sweden)

    László Patthy

    2011-07-01

    Full Text Available In view of the fact that appearance of novel protein domain architectures (DA is closely associated with biological innovations, there is a growing interest in the genome-scale reconstruction of the evolutionary history of the domain architectures of multidomain proteins. In such analyses, however, it is usually ignored that a significant proportion of Metazoan sequences analyzed is mispredicted and that this may seriously affect the validity of the conclusions. To estimate the contribution of errors in gene prediction to differences in DA of predicted proteins, we have used the high quality manually curated UniProtKB/Swiss-Prot database as a reference. For genome-scale analysis of domain architectures of predicted proteins we focused on RefSeq, EnsEMBL and NCBI’s GNOMON predicted sequences of Metazoan species with completely sequenced genomes. Comparison of the DA of UniProtKB/Swiss-Prot sequences of worm, fly, zebrafish, frog, chick, mouse, rat and orangutan with those of human Swiss-Prot entries have identified relatively few cases where orthologs had different DA, although the percentage with different DA increased with evolutionary distance. In contrast with this, comparison of the DA of human, orangutan, rat, mouse, chicken, frog, zebrafish, worm and fly RefSeq, EnsEMBL and NCBI’s GNOMON predicted protein sequences with those of the corresponding/orthologous human Swiss-Prot entries identified a significantly higher proportion of domain architecture differences than in the case of the comparison of Swiss-Prot entries. Analysis of RefSeq, EnsEMBL and NCBI’s GNOMON predicted protein sequences with DAs different from those of their Swiss-Prot orthologs confirmed that the higher rate of domain architecture differences is due to errors in gene prediction, the majority of which could be corrected with our FixPred protocol. We have also demonstrated that contamination of databases with incomplete, abnormal or mispredicted sequences

  5. Interactome of Radiation-Induced microRNA-Predicted Target Genes

    Directory of Open Access Journals (Sweden)

    Tenzin W. Lhakhang

    2012-01-01

    Full Text Available The microRNAs (miRNAs function as global negative regulators of gene expression and have been associated with a multitude of biological processes. The dysfunction of the microRNAome has been linked to various diseases including cancer. Our laboratory recently reported modulation in the expression of miRNA in a variety of cell types exposed to ionizing radiation (IR. To further understand miRNA role in IR-induced stress pathways, we catalogued a set of common miRNAs modulated in various irradiated cell lines and generated a list of predicted target genes. Using advanced bioinformatics tools we identified cellular pathways where miRNA predicted target genes function. The miRNA-targeted genes were found to play key roles in previously identified IR stress pathways such as cell cycle, p53 pathway, TGF-beta pathway, ubiquitin-mediated proteolysis, focal adhesion pathway, MAPK signaling, thyroid cancer pathway, adherens junction, insulin signaling pathway, oocyte meiosis, regulation of actin cytoskeleton, and renal cell carcinoma pathway. Interestingly, we were able to identify novel targeted pathways that have not been identified in cellular radiation response, such as aldosterone-regulated sodium reabsorption, long-term potentiation, and neutrotrophin signaling pathways. Our analysis indicates that the miRNA interactome in irradiated cells provides a platform for comprehensive modeling of the cellular stress response to IR exposure.

  6. Hybrid models identified a 12-gene signature for lung cancer prognosis and chemoresponse prediction.

    Directory of Open Access Journals (Sweden)

    Ying-Wooi Wan

    Full Text Available Lung cancer remains the leading cause of cancer-related deaths worldwide. The recurrence rate ranges from 35-50% among early stage non-small cell lung cancer patients. To date, there is no fully-validated and clinically applied prognostic gene signature for personalized treatment.From genome-wide mRNA expression profiles generated on 256 lung adenocarcinoma patients, a 12-gene signature was identified using combinatorial gene selection methods, and a risk score algorithm was developed with Naïve Bayes. The 12-gene model generates significant patient stratification in the training cohort HLM & UM (n = 256; log-rank P = 6.96e-7 and two independent validation sets, MSK (n = 104; log-rank P = 9.88e-4 and DFCI (n = 82; log-rank P = 2.57e-4, using Kaplan-Meier analyses. This gene signature also stratifies stage I and IB lung adenocarcinoma patients into two distinct survival groups (log-rank P<0.04. The 12-gene risk score is more significant (hazard ratio = 4.19, 95% CI: [2.08, 8.46] than other commonly used clinical factors except tumor stage (III vs. I in multivariate Cox analyses. The 12-gene model is more accurate than previously published lung cancer gene signatures on the same datasets. Furthermore, this signature accurately predicts chemoresistance/chemosensitivity to Cisplatin, Carboplatin, Paclitaxel, Etoposide, Erlotinib, and Gefitinib in NCI-60 cancer cell lines (P<0.017. The identified 12 genes exhibit curated interactions with major lung cancer signaling hallmarks in functional pathway analysis. The expression patterns of the signature genes have been confirmed in RT-PCR analyses of independent tumor samples.The results demonstrate the clinical utility of the identified gene signature in prognostic categorization. With this 12-gene risk score algorithm, early stage patients at high risk for tumor recurrence could be identified for adjuvant chemotherapy; whereas stage I and II patients at low risk could be spared the toxic side effects of

  7. iFish: predicting the pathogenicity of human nonsynonymous variants using gene-specific/family-specific attributes and classifiers.

    Science.gov (United States)

    Wang, Meng; Wei, Liping

    2016-08-16

    Accurate prediction of the pathogenicity of genomic variants, especially nonsynonymous single nucleotide variants (nsSNVs), is essential in biomedical research and clinical genetics. Most current prediction methods build a generic classifier for all genes. However, different genes and gene families have different features. We investigated whether gene-specific and family-specific customized classifiers could improve prediction accuracy. Customized gene-specific and family-specific attributes were selected with AIC, BIC, and LASSO, and Support Vector Machine classifiers were generated for 254 genes and 152 gene families, covering a total of 5,985 genes. Our results showed that the customized attributes reflected key features of the genes and gene families, and the customized classifiers achieved higher prediction accuracy than the generic classifier. The customized classifiers and the generic classifier for other genes and families were integrated into a new tool named iFish (integrated Functional inference of SNVs in human, http://ifish.cbi.pku.edu.cn). iFish outperformed other methods on benchmark datasets as well as on prioritization of candidate causal variants from whole exome sequencing. iFish provides a user-friendly web-based interface and supports other functionalities such as integration of genetic evidence. iFish would facilitate high-throughput evaluation and prioritization of nsSNVs in human genetics research.

  8. Predicting gene function using hierarchical multi-label decision tree ensembles

    Directory of Open Access Journals (Sweden)

    Kocev Dragi

    2010-01-01

    Full Text Available Abstract Background S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. Results We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO. We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. Conclusions Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.

  9. [A novel method of the genome-wide prediction for the target genes and its application].

    Science.gov (United States)

    Zhang, Jing-Jing; Feng, Jing; Zhu, Ying-Guo; Li, Yang-Sheng

    2006-10-01

    Based on the protein databases of several model species, this study developed a new method of the Genome-wide prediction for the target genes, using Hidden Markov model by Perl programming. The advantages of this method are high throughput, high quality and easy prediction, especially in the case of multi-domains proteins families. By this method, we predicted the PPR and TPR proteins families in whole genome of several model species. There were 536 PPR proteins and 199 TPR proteins in Oryza sativa ssp. japonica, 519 PPR proteins and 177 TPR proteins in Oryza sativa L. ssp. indica, 735 PPR proteins and 292 TPR proteins in Arabidopsis thaliana, 6 PPR proteins and 32 TPR proteins in Cyanidioschyzon merolae. Synechococcus and Thermophilic archaebacterium did not have PPR proteins. By contrast, 10 TPR proteins were found in Synechococcus and 4 TPR proteins were found in Thermophilic archaebacterium. Moreover, of these results, some further bioinformatics analyses were conducted.

  10. Striatal and extrastriatal dopamine D2 receptor occupancy by a novel antipsychotic, blonanserin: a PET study with [11C]raclopride and [11C]FLB 457 in schizophrenia.

    Science.gov (United States)

    Tateno, Amane; Arakawa, Ryosuke; Okumura, Masaki; Fukuta, Hajime; Honjo, Kazuyoshi; Ishihara, Keiichi; Nakamura, Hiroshi; Kumita, Shin-ichiro; Okubo, Yoshiro

    2013-04-01

    Blonanserin is a novel antipsychotic with high affinities for dopamine D(2) and 5-HT(2A) receptors, and it was recently approved for the treatment of schizophrenia in Japan and Korea. Although double-blind clinical trials have demonstrated that blonanserin has equal efficacy to risperidone, and with a better profile especially with respect to prolactin elevation, its profile of in vivo receptor binding has not been investigated in patients with schizophrenia. Using positron emission tomography (PET), we measured striatal and extrastriatal dopamine D(2) receptor occupancy by blonanserin in 15 patients with schizophrenia treated with fixed doses of blonanserin (ie, 8, 16, and 24 mg/d) for at least 4 weeks before PET scans, and in 15 healthy volunteers. Two PET scans, 1 with [(11)C]raclopride for the striatum and 1 with [(11)C]FLB 457 for the temporal cortex and pituitary, were performed on the same day. Striatal dopamine D(2) receptor occupancy by blonanserin was 60.8% (3.0%) [mean (SD)] at 8 mg, 73.4% (4.9%) at 16 mg, and 79.7% (2.3%) at 24 mg. The brain/plasma concentration ratio calculated from D(2) receptor occupancy in the temporal cortex and pituitary was 3.38, indicating good blood-brain barrier permeability. This was the first study to show clinical daily dose amounts of blonanserin occupying dopamine D(2) receptors in patients with schizophrenia. The clinical implications obtained in this study were the optimal therapeutic dose range of 12.9 to 22.1 mg/d of blonanserin required for 70% to 80% dopamine D(2) receptor occupancy in the striatum, and the good blood-brain barrier permeability that suggested a relatively lower risk of hyperprolactinemia.

  11. A New Drug Combinatory Effect Prediction Algorithm on the Cancer Cell Based on Gene Expression and Dose-Response Curve.

    Science.gov (United States)

    Goswami, C Pankaj; Cheng, L; Alexander, P S; Singal, A; Li, L

    2015-02-01

    Gene expression data before and after treatment with an individual drug and the IC20 of dose-response data were utilized to predict two drugs' interaction effects on a diffuse large B-cell lymphoma (DLBCL) cancer cell. A novel drug interaction scoring algorithm was developed to account for either synergistic or antagonistic effects between drug combinations. Different core gene selection schemes were investigated, which included the whole gene set, the drug-sensitive gene set, the drug-sensitive minus drug-resistant gene set, and the known drug target gene set. The prediction scores were compared with the observed drug interaction data at 6, 12, and 24 hours with a probability concordance (PC) index. The test result shows the concordance between observed and predicted drug interaction ranking reaches a PC index of 0.605. The scoring reliability and efficiency was further confirmed in five drug interaction studies published in the GEO database.

  12. Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds.

    Directory of Open Access Journals (Sweden)

    Howard Y Chang

    2004-02-01

    Full Text Available Cancer invasion and metastasis have been likened to wound healing gone awry. Despite parallels in cellular behavior between cancer progression and wound healing, the molecular relationships between these two processes and their prognostic implications are unclear. In this study, based on gene expression profiles of fibroblasts from ten anatomic sites, we identify a stereotyped gene expression program in response to serum exposure that appears to reflect the multifaceted role of fibroblasts in wound healing. The genes comprising this fibroblast common serum response are coordinately regulated in many human tumors, allowing us to identify tumors with gene expression signatures suggestive of active wounds. Genes induced in the fibroblast serum-response program are expressed in tumors by the tumor cells themselves, by tumor-associated fibroblasts, or both. The molecular features that define this wound-like phenotype are evident at an early clinical stage, persist during treatment, and predict increased risk of metastasis and death in breast, lung, and gastric carcinomas. Thus, the transcriptional signature of the response of fibroblasts to serum provides a possible link between cancer progression and wound healing, as well as a powerful predictor of the clinical course in several common carcinomas.

  13. Four genes predict high risk of progression from smoldering to symptomatic multiple myeloma (SWOG S0120).

    Science.gov (United States)

    Khan, Rashid; Dhodapkar, Madhav; Rosenthal, Adam; Heuck, Christoph; Papanikolaou, Xenofon; Qu, Pingping; van Rhee, Frits; Zangari, Maurizio; Jethava, Yogesh; Epstein, Joshua; Yaccoby, Shmuel; Hoering, Antje; Crowley, John; Petty, Nathan; Bailey, Clyde; Morgan, Gareth; Barlogie, Bart

    2015-09-01

    Multiple myeloma is preceded by an asymptomatic phase, comprising monoclonal gammopathy of uncertain significance and smoldering myeloma. Compared to the former, smoldering myeloma has a higher and non-uniform rate of progression to clinical myeloma, reflecting a subset of patients with higher risk. We evaluated the gene expression profile of smoldering myeloma plasma cells among 105 patients enrolled in a prospective observational trial at our institution, with a view to identifying a high-risk signature. Baseline clinical, bone marrow, cytogenetic and radiologic data were evaluated for their potential to predict time to therapy for symptomatic myeloma. A gene signature derived from four genes, at an optimal binary cut-point of 9.28, identified 14 patients (13%) with a 2-year therapy risk of 85.7%. Conversely, a low four-gene score (probe sets showed concordance with indices of chromosome instability. These data demonstrate high discriminatory power of a gene-based assay and suggest a role for dysregulation of mitotic checkpoints in the context of genomic instability as a hallmark of high-risk smoldering myeloma.

  14. Indole-Diterpene Biosynthetic Capability of Epichloë Endophytes as Predicted by ltm Gene Analysis▿

    Science.gov (United States)

    Young, Carolyn A.; Tapper, Brian A.; May, Kimberley; Moon, Christina D.; Schardl, Christopher L.; Scott, Barry

    2009-01-01

    Bioprotective alkaloids produced by Epichloë and closely related asexual Neotyphodium fungal endophytes protect their grass hosts from insect and mammalian herbivory. One class of these compounds, known for antimammalian toxicity, is the indole-diterpenes. The LTM locus of Neotyphodium lolii (Lp19) and Epichloë festuce (Fl1), required for the biosynthesis of the indole-diterpene lolitrem, consists of 10 ltm genes. We have used PCR and Southern analysis to screen a broad taxonomic range of 44 endophyte isolates to determine why indole-diterpenes are present in so few endophyte-grass associations in comparison to that of the other bioprotective alkaloids, which are more widespread among the endophtyes. All 10 ltm genes were present in only three epichloë endophytes. A predominance of the asexual Neotyphodium spp. examined contained 8 of the 10 ltm genes, with only one N. lolii containing the entire LTM locus and the ability to produce lolitrems. Liquid chromatography-tandem mass spectrometry profiles of indole-diterpenes from a subset of endophyte-infected perennial ryegrass showed that endophytes that contained functional genes present in ltm clusters 1 and 2 were capable of producing simple indole-diterpenes such as paspaline, 13-desoxypaxilline, and terpendoles, compounds predicted to be precursors of lolitrem B. Analysis of toxin biosynthesis genes by PCR now enables a diagnostic method to screen endophytes for both beneficial and detrimental alkaloids and can be used as a resource for screening isolates required for forage improvement. PMID:19181837

  15. Identification of prognostic genes for recurrent risk prediction in triple negative breast cancer patients in Taiwan.

    Directory of Open Access Journals (Sweden)

    Lee H Chen

    Full Text Available Discrepancies in the prognosis of triple negative breast cancer exist between Caucasian and Asian populations. Yet, the gene signature of triple negative breast cancer specifically for Asians has not become available. Therefore, the purpose of this study is to construct a prediction model for recurrence of triple negative breast cancer in Taiwanese patients. Whole genome expression profiling of breast cancers from 185 patients in Taiwan from 1995 to 2008 was performed, and the results were compared to the previously published literature to detect differences between Asian and Western patients. Pathway analysis and Cox proportional hazard models were applied to construct a prediction model for the recurrence of triple negative breast cancer. Hierarchical cluster analysis showed that triple negative breast cancers from different races were in separate sub-clusters but grouped in a bigger cluster. Two pathways, cAMP-mediated signaling and ephrin receptor signaling, were significantly associated with the recurrence of triple negative breast cancer. After using stepwise model selection from the combination of the initial filtered genes, we developed a prediction model based on the genes SLC22A23, PRKAG3, DPEP3, MORC2, GRB7, and FAM43A. The model had 91.7% accuracy, 81.8% sensitivity, and 94.6% specificity under leave-one-out support vector regression. In this study, we identified pathways related to triple negative breast cancer and developed a model to predict its recurrence. These results could be used for assisting with clinical prognosis and warrant further investigation into the possibility of targeted therapy of triple negative breast cancer in Taiwanese patients.

  16. A 50-gene intrinsic subtype classifier for prognosis and prediction of benefit from adjuvant tamoxifen.

    Science.gov (United States)

    Chia, Stephen K; Bramwell, Vivien H; Tu, Dongsheng; Shepherd, Lois E; Jiang, Shan; Vickery, Tammi; Mardis, Elaine; Leung, Samuel; Ung, Karen; Pritchard, Kathleen I; Parker, Joel S; Bernard, Philip S; Perou, Charles M; Ellis, Matthew J; Nielsen, Torsten O

    2012-08-15

    Gene expression profiling classifies breast cancer into intrinsic subtypes based on the biology of the underlying disease pathways. We have used material from a prospective randomized trial of tamoxifen versus placebo in premenopausal women with primary breast cancer (NCIC CTG MA.12) to evaluate the prognostic and predictive significance of intrinsic subtypes identified by both the PAM50 gene set and by immunohistochemistry. Total RNA from 398 of 672 (59%) patients was available for intrinsic subtyping with a quantitative reverse transcriptase PCR (qRT-PCR) 50-gene predictor (PAM50) for luminal A, luminal B, HER-2-enriched, and basal-like subtypes. A tissue microarray was also constructed from 492 of 672 (73%) of the study population to assess a panel of six immunohistochemical IHC antibodies to define the same intrinsic subtypes. Classification into intrinsic subtypes by the PAM50 assay was prognostic for both disease-free survival (DFS; P = 0.0003) and overall survival (OS; P = 0.0002), whereas classification by the IHC panel was not. Luminal subtype by PAM50 was predictive of tamoxifen benefit [DFS: HR, 0.52; 95% confidence interval (CI), 0.32-0.86 vs. HR, 0.80; 95% CI, 0.50-1.29 for nonluminal subtypes], although the interaction test was not significant (P = 0.24), whereas neither subtyping by central immunohistochemistry nor by local estrogen receptor (ER) or progesterone receptor (PR) status were predictive. Risk of relapse (ROR) modeling with the PAM50 assay produced a continuous risk score in both node-negative and node-positive disease. In the MA.12 study, intrinsic subtype classification by qRT-PCR with the PAM50 assay was superior to IHC profiling for both prognosis and prediction of benefit from adjuvant tamoxifen.

  17. Microbial forensics: predicting phenotypic characteristics and environmental conditions from large-scale gene expression profiles.

    Science.gov (United States)

    Kim, Minseung; Zorraquino, Violeta; Tagkopoulos, Ilias

    2015-03-01

    A tantalizing question in cellular physiology is whether the cellular state and environmental conditions can be inferred by the expression signature of an organism. To investigate this relationship, we created an extensive normalized gene expression compendium for the bacterium Escherichia coli that was further enriched with meta-information through an iterative learning procedure. We then constructed an ensemble method to predict environmental and cellular state, including strain, growth phase, medium, oxygen level, antibiotic and carbon source presence. Results show that gene expression is an excellent predictor of environmental structure, with multi-class ensemble models achieving balanced accuracy between 70.0% (±3.5%) to 98.3% (±2.3%) for the various characteristics. Interestingly, this performance can be significantly boosted when environmental and strain characteristics are simultaneously considered, as a composite classifier that captures the inter-dependencies of three characteristics (medium, phase and strain) achieved 10.6% (±1.0%) higher performance than any individual models. Contrary to expectations, only 59% of the top informative genes were also identified as differentially expressed under the respective conditions. Functional analysis of the respective genetic signatures implicates a wide spectrum of Gene Ontology terms and KEGG pathways with condition-specific information content, including iron transport, transferases, and enterobactin synthesis. Further experimental phenotypic-to-genotypic mapping that we conducted for knock-out mutants argues for the information content of top-ranked genes. This work demonstrates the degree at which genome-scale transcriptional information can be predictive of latent, heterogeneous and seemingly disparate phenotypic and environmental characteristics, with far-reaching applications.

  18. Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites.

    Science.gov (United States)

    Qin, Zhaohui S; McCue, Lee Ann; Thompson, William; Mayerhofer, Linda; Lawrence, Charles E; Liu, Jun S

    2003-04-01

    The identification of co-regulated genes and their transcription-factor binding sites (TFBS) are key steps toward understanding transcription regulation. In addition to effective laboratory assays, various computational approaches for the detection of TFBS in promoter regions of coexpressed genes have been developed. The availability of complete genome sequences combined with the likelihood that transcription factors and their cognate sites are often conserved during evolution has led to the development of phylogenetic footprinting. The modus operandi of this technique is to search for conserved motifs upstream of orthologous genes from closely related species. The method can identify hundreds of TFBS without prior knowledge of co-regulation or coexpression. Because many of these predicted sites are likely to be bound by the same transcription factor, motifs with similar patterns can be put into clusters so as to infer the sets of co-regulated genes, that is, the regulons. This strategy utilizes only genome sequence information and is complementary to and confirmative of gene expression data generated by microarray experiments. However, the limited data available to characterize individual binding patterns, the variation in motif alignment, motif width, and base conservation, and the lack of knowledge of the number and sizes of regulons make this inference problem difficult. We have developed a Gibbs sampling-based Bayesian motif clustering (BMC) algorithm to address these challenges. Tests on simulated data sets show that BMC produces many fewer errors than hierarchical and K-means clustering methods. The application of BMC to hundreds of predicted gamma-proteobacterial motifs correctly identified many experimentally reported regulons, inferred the existence of previously unreported members of these regulons, and suggested novel regulons.

  19. Microbial forensics: predicting phenotypic characteristics and environmental conditions from large-scale gene expression profiles.

    Directory of Open Access Journals (Sweden)

    Minseung Kim

    2015-03-01

    Full Text Available A tantalizing question in cellular physiology is whether the cellular state and environmental conditions can be inferred by the expression signature of an organism. To investigate this relationship, we created an extensive normalized gene expression compendium for the bacterium Escherichia coli that was further enriched with meta-information through an iterative learning procedure. We then constructed an ensemble method to predict environmental and cellular state, including strain, growth phase, medium, oxygen level, antibiotic and carbon source presence. Results show that gene expression is an excellent predictor of environmental structure, with multi-class ensemble models achieving balanced accuracy between 70.0% (±3.5% to 98.3% (±2.3% for the various characteristics. Interestingly, this performance can be significantly boosted when environmental and strain characteristics are simultaneously considered, as a composite classifier that captures the inter-dependencies of three characteristics (medium, phase and strain achieved 10.6% (±1.0% higher performance than any individual models. Contrary to expectations, only 59% of the top informative genes were also identified as differentially expressed under the respective conditions. Functional analysis of the respective genetic signatures implicates a wide spectrum of Gene Ontology terms and KEGG pathways with condition-specific information content, including iron transport, transferases, and enterobactin synthesis. Further experimental phenotypic-to-genotypic mapping that we conducted for knock-out mutants argues for the information content of top-ranked genes. This work demonstrates the degree at which genome-scale transcriptional information can be predictive of latent, heterogeneous and seemingly disparate phenotypic and environmental characteristics, with far-reaching applications.

  20. In vivo validation of a computationally predicted conserved Ath5 target gene set.

    Directory of Open Access Journals (Sweden)

    Filippo Del Bene

    2007-09-01

    Full Text Available So far, the computational identification of transcription factor binding sites is hampered by the complexity of vertebrate genomes. Here we present an in silico procedure to predict target sites of a transcription factor in complex genomes using its binding site. In a first step sequence, comparison of closely related genomes identifies the binding sites in conserved cis-regulatory regions (phylogenetic footprinting. Subsequently, more remote genomes are introduced into the comparison to identify highly conserved and therefore putatively functional binding sites (phylogenetic filtering. When applied to the binding site of atonal homolog 5 (Ath5 or ATOH7, this procedure efficiently filters evolutionarily conserved binding sites out of more than 300,000 instances in a vertebrate genome. We validate a selection of the linked target genes by showing coexpression with and transcriptional regulation by Ath5. Finally, chromatin immunoprecipitation demonstrates the occupancy of the target gene promoters by Ath5. Thus, our procedure, applied to whole genomes, is a fast and predictive tool to in silico filter the target genes of a given transcription factor with defined binding site.

  1. CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation.

    Science.gov (United States)

    Nikulova, Anna A; Favorov, Alexander V; Sutormin, Roman A; Makeev, Vsevolod J; Mironov, Andrey A

    2012-07-01

    Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.

  2. Meta4: a web application for sharing and annotating metagenomic gene predictions using web services.

    Science.gov (United States)

    Richardson, Emily J; Escalettes, Franck; Fotheringham, Ian; Wallace, Robert J; Watson, Mick

    2013-01-01

    Whole-genome shotgun metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website, code is available on Github, a cloud image is available, and an example implementation can be seen at.

  3. Combining Hi-C data with phylogenetic correlation to predict the target genes of distal regulatory elements in human genome.

    Science.gov (United States)

    Lu, Yulan; Zhou, Yuanpeng; Tian, Weidong

    2013-12-01

    Defining the target genes of distal regulatory elements (DREs), such as enhancer, repressors and insulators, is a challenging task. The recently developed Hi-C technology is designed to capture chromosome conformation structure by high-throughput sequencing, and can be potentially used to determine the target genes of DREs. However, Hi-C data are noisy, making it difficult to directly use Hi-C data to identify DRE-target gene relationships. In this study, we show that DREs-gene pairs that are confirmed by Hi-C data are strongly phylogenetic correlated, and have thus developed a method that combines Hi-C read counts with phylogenetic correlation to predict long-range DRE-target gene relationships. Analysis of predicted DRE-target gene pairs shows that genes regulated by large number of DREs tend to have essential functions, and genes regulated by the same DREs tend to be functionally related and co-expressed. In addition, we show with a couple of examples that the predicted target genes of DREs can help explain the causal roles of disease-associated single-nucleotide polymorphisms located in the DREs. As such, these predictions will be of importance not only for our understanding of the function of DREs but also for elucidating the causal roles of disease-associated noncoding single-nucleotide polymorphisms.

  4. A hemocyte gene expression signature correlated with predictive capacity of oysters to survive Vibrio infections

    Directory of Open Access Journals (Sweden)

    Rosa Rafael

    2012-06-01

    Full Text Available Abstract Background The complex balance between environmental and host factors is an important determinant of susceptibility to infection. Disturbances of this equilibrium may result in multifactorial diseases as illustrated by the summer mortality syndrome, a worldwide and complex phenomenon that affects the oysters, Crassostrea gigas. The summer mortality syndrome reveals a physiological intolerance making this oyster species susceptible to diseases. Exploration of genetic basis governing the oyster resistance or susceptibility to infections is thus a major goal for understanding field mortality events. In this context, we used high-throughput genomic approaches to identify genetic traits that may characterize inherent survival capacities in C. gigas. Results Using digital gene expression (DGE, we analyzed the transcriptomes of hemocytes (immunocompetent cells of oysters able or not able to survive infections by Vibrio species shown to be involved in summer mortalities. Hemocytes were nonlethally collected from oysters before Vibrio experimental infection, and two DGE libraries were generated from individuals that survived or did not survive. Exploration of DGE data and microfluidic qPCR analyses at individual level showed an extraordinary polymorphism in gene expressions, but also a set of hemocyte-expressed genes whose basal mRNA levels discriminate oyster capacity to survive infections by the pathogenic V. splendidus LGP32. Finally, we identified a signature of 14 genes that predicted oyster survival capacity. Their expressions are likely driven by distinct transcriptional regulation processes associated or not associated to gene copy number variation (CNV. Conclusions We provide here for the first time in oyster a gene expression survival signature that represents a useful tool for understanding mortality events and for assessing genetic traits of interest for disease resistance selection programs.

  5. Co-expressed Pathways DataBase for Tomato: a database to predict pathways relevant to a query gene.

    Science.gov (United States)

    Narise, Takafumi; Sakurai, Nozomu; Obayashi, Takeshi; Ohta, Hiroyuki; Shibata, Daisuke

    2017-06-05

    Gene co-expression, the similarity of gene expression profiles under various experimental conditions, has been used as an indicator of functional relationships between genes, and many co-expression databases have been developed for predicting gene functions. These databases usually provide users with a co-expression network and a list of strongly co-expressed genes for a query gene. Several of these databases also provide functional information on a set of strongly co-expressed genes (i.e., provide biological processes and pathways that are enriched in these strongly co-expressed genes), which is generally analyzed via over-representation analysis (ORA). A limitation of this approach may be that users can predict gene functions only based on the strongly co-expressed genes. In this study, we developed a new co-expression database that enables users to predict the function of tomato genes from the results of functional enrichment analyses of co-expressed genes while considering the genes that are not strongly co-expressed. To achieve this, we used the ORA approach with several thresholds to select co-expressed genes, and performed gene set enrichment analysis (GSEA) applied to a ranked list of genes ordered by the co-expression degree. We found that internal correlation in pathways affected the significance levels of the enrichment analyses. Therefore, we introduced a new measure for evaluating the relationship between the gene and pathway, termed the percentile (p)-score, which enables users to predict functionally relevant pathways without being affected by the internal correlation in pathways. In addition, we evaluated our approaches using receiver operating characteristic curves, which concluded that the p-score could improve the performance of the ORA. We developed a new database, named Co-expressed Pathways DataBase for Tomato, which is available at http://cox-path-db.kazusa.or.jp/tomato . The database allows users to predict pathways that are relevant to a

  6. Predicting Autism Spectrum Disorder Using Blood-based Gene Expression Signatures and Machine Learning

    Science.gov (United States)

    Oh, Dong Hoon; Kim, Il Bin; Kim, Seok Hyeon; Ahn, Dong Hyun

    2017-01-01

    Objective The aim of this study was to identify a transcriptomic signature that could be used to classify subjects with autism spectrum disorder (ASD) compared to controls on the basis of blood gene expression profiles. The gene expression profiles could ultimately be used as diagnostic biomarkers for ASD. Methods We used the published microarray data (GSE26415) from the Gene Expression Omnibus database, which included 21 young adults with ASD and 21 age- and sex-matched unaffected controls. Nineteen differentially expressed probes were identified from a training dataset (n=26, 13 ASD cases and 13 controls) using the limma package in R language (adjusted p value <0.05) and were further analyzed in a test dataset (n=16, 8 ASD cases and 8 controls) using machine learning algorithms. Results Hierarchical cluster analysis showed that subjects with ASD were relatively well-discriminated from controls. Based on the support vector machine and K-nearest neighbors analysis, validation of 19-DE probes with a test dataset resulted in an overall class prediction accuracy of 93.8% as well as a sensitivity and specificity of 100% and 87.5%, respectively. Conclusion The results of our exploratory study suggest that the gene expression profiles identified from the peripheral blood samples of young adults with ASD can be used to identify a biological signature for ASD. Further study using a larger cohort and more homogeneous datasets is required to improve the diagnostic accuracy. PMID:28138110

  7. Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions.

    Science.gov (United States)

    Shugay, Mikhail; Ortiz de Mendíbil, Iñigo; Vizmanos, José L; Novo, Francisco J

    2013-10-15

    Gene fusions resulting from chromosomal aberrations are an important cause of cancer. The complexity of genomic changes in certain cancer types has hampered the identification of gene fusions by molecular cytogenetic methods, especially in carcinomas. This is changing with the advent of next-generation sequencing, which is detecting a substantial number of new fusion transcripts in individual cancer genomes. However, this poses the challenge of identifying those fusions with greater oncogenic potential amid a background of 'passenger' fusion sequences. In the present work, we have used some recently identified genomic hallmarks of oncogenic fusion genes to develop a pipeline for the classification of fusion sequences, namely, Oncofuse. The pipeline predicts the oncogenic potential of novel fusion genes, calculating the probability that a fusion sequence behaves as 'driver' of the oncogenic process based on features present in known oncogenic fusions. Cross-validation and extensive validation tests on independent datasets suggest a robust behavior with good precision and recall rates. We believe that Oncofuse could become a useful tool to guide experimental validation studies of novel fusion sequences found during next-generation sequencing analysis of cancer transcriptomes. Oncofuse is a naive Bayes Network Classifier trained and tested using Weka machine learning package. The pipeline is executed by running a Java/Groovy script, available for download at www.unav.es/genetica/oncofuse.html.

  8. Angiotensinogen gene polymorphism predicts hypertension, and iridological constitutional classification enhances the risk for hypertension in Koreans.

    Science.gov (United States)

    Cho, Joo-Jang; Hwang, Woo-Jun; Hong, Seung-Heon; Jeong, Hyun-Ja; Lee, Hye-Jung; Kim, Hyung-Min; Um, Jae-Young

    2008-05-01

    This study investigated the relationship between iridological constitution and angiotensinogen (AGN) gene polymorphism in hypertensives. In addition to angiotensin converting enzyme gene, AGN genotype is also one of the most well studied genetic markers of hypertension. Furthermore, iridology, one of complementary and alternative medicine, is the diagnosis of the medical conditions through noting irregularities of the pigmentation in the iris. Iridological constitution has a strong familial aggregation and is implicated in heredity. Therefore, the study classified 87 hypertensive patients with familial history of cerebral infarction and controls (n = 88) according to Iris constitution, and determined AGN genotype. As a result, the AGN/TT genotype was associated with hypertension (chi2 = 13.413, p iridological constitutional classification increased the relative risk for hypertension in the subjects with AGN/T allele. These results suggest that AGN polymorphism predicts hypertension, and iridological constitutional classification enhances the risk for hypertension associated with AGN/T in a Korean population.

  9. A Digital Signal Processing Method for Gene Prediction with Improved Noise Suppression

    Directory of Open Access Journals (Sweden)

    Carreira Alex

    2004-01-01

    Full Text Available It has been observed that the protein-coding regions of DNA sequences exhibit period-three behaviour, which can be exploited to predict the location of coding regions within genes. Previously, discrete Fourier transform (DFT and digital filter-based methods have been used for the identification of coding regions. However, these methods do not significantly suppress the noncoding regions in the DNA spectrum at . Consequently, a noncoding region may inadvertently be identified as a coding region. This paper introduces a new technique (a single digital filter operation followed by a quadratic window operation that suppresses nearly all of the noncoding regions. The proposed method therefore improves the likelihood of correctly identifying coding regions in such genes.

  10. Comparative analysis of codon usage patterns and identification of predicted highly expressed genes in five Salmonella genomes

    Directory of Open Access Journals (Sweden)

    Mondal U

    2008-01-01

    Full Text Available Purpose: To anlyse codon usage patterns of five complete genomes of Salmonella , predict highly expressed genes, examine horizontally transferred pathogenicity-related genes to detect their presence in the strains, and scrutinize the nature of highly expressed genes to infer upon their lifestyle. Methods: Protein coding genes, ribosomal protein genes, and pathogenicity-related genes were analysed with Codon W and CAI (codon adaptation index Calculator. Results: Translational efficiency plays a role in codon usage variation in Salmonella genes. Low bias was noticed in most of the genes. GC3 (guanine cytosine at third position composition does not influence codon usage variation in the genes of these Salmonella strains. Among the cluster of orthologous groups (COGs, translation, ribosomal structure biogenesis [J], and energy production and conversion [C] contained the highest number of potentially highly expressed (PHX genes. Correspondence analysis reveals the conserved nature of the genes. Highly expressed genes were detected. Conclusions: Selection for translational efficiency is the major source of variation of codon usage in the genes of Salmonella . Evolution of pathogenicity-related genes as a unit suggests their ability to infect and exist as a pathogen. Presence of a lot of PHX genes in the information and storage-processing category of COGs indicated their lifestyle and revealed that they were not subjected to genome reduction.

  11. Observed and predicted changes in virulence gene frequencies at 11 loci in a local barley powdery mildew population

    DEFF Research Database (Denmark)

    Hovmøller, M.S.; Munk, L.; Østergård, H.

    1993-01-01

    a survey comprising 11 virulence loc. Predictions were based on a model where selection forces were estimated through detailed mapping in the local area of host cultivars and their resistance genes, and taking into account the changes in distribution of host cultivars during the year caused by growth......The aim of the present study was to investigate observed and predicted changes in virulence gene frequencies in a local aerial powdery mildew population subject to selection by different host cultivars in a local barley area. Observed changes were based on genotypic frequencies obtained through...... with a constant distribution of host cultivars. Significant changes in gene frequencies were observed for virulence genes subject to strong direct selection as well as for genes subject mainly to indirect selection (hitchhiking). These patterns of changes were generally as predicted from the model. The influence...

  12. The use of Gene Ontology terms and KEGG pathways for analysis and prediction of oncogenes.

    Science.gov (United States)

    Xing, Zhihao; Chu, Chen; Chen, Lei; Kong, Xiangyin

    2016-11-01

    Oncogenes are a type of genes that have the potential to cause cancer. Most normal cells undergo programmed cell death, namely apoptosis, but activated oncogenes can help cells avoid apoptosis and survive. Thus, studying oncogenes is helpful for obtaining a good understanding of the formation and development of various types of cancers. In this study, we proposed a computational method, called OPM, for investigating oncogenes from the view of Gene Ontology (GO) and biological pathways. All investigated genes, including validated oncogenes retrieved from some public databases and other genes that have not been reported to be oncogenes thus far, were encoded into numeric vectors according to the enrichment theory of GO terms and KEGG pathways. Some popular feature selection methods, minimum redundancy maximum relevance and incremental feature selection, and an advanced machine learning algorithm, random forest, were adopted to analyze the numeric vectors to extract key GO terms and KEGG pathways. Along with the oncogenes, GO terms and KEGG pathways were discussed in terms of their relevance in this study. Some important GO terms and KEGG pathways were extracted using feature selection methods and were confirmed to be highly related to oncogenes. Additionally, the importance of these terms and pathways in predicting oncogenes was further demonstrated by finding new putative oncogenes based on them. This study investigated oncogenes based on GO terms and KEGG pathways. Some important GO terms and KEGG pathways were confirmed to be highly related to oncogenes. We hope that these GO terms and KEGG pathways can provide new insight for the study of oncogenes, particularly for building more effective prediction models to identify novel oncogenes. The program is available upon request. We hope that the new findings listed in this study may provide a new insight for the investigation of oncogenes. This article is part of a Special Issue entitled "System Genetics" Guest Editor

  13. Inflammation markers predict zinc transporter gene expression in women with type 2 diabetes mellitus.

    Science.gov (United States)

    Foster, Meika; Petocz, Peter; Samman, Samir

    2013-09-01

    The pathology of type 2 diabetes mellitus (DM) often is associated with underlying states of conditioned zinc deficiency and chronic inflammation. Zinc and omega-3 polyunsaturated fatty acids each exhibit anti-inflammatory effects and may be of therapeutic benefit in the disease. The present randomized, double-blind, placebo-controlled, 12-week trial was designed to investigate the effects of zinc (40 mg/day) and α-linolenic acid (ALA; 2 g/day flaxseed oil) supplementation on markers of inflammation [interleukin (IL)-1β, IL-6, tumor necrosis factor (TNF)-α, C-reactive protein (CRP)] and zinc transporter and metallothionein gene expression in 48 postmenopausal women with type 2 DM. No significant effects of zinc or ALA supplementation were observed on inflammatory marker concentrations or fold change in zinc transporter and metallothionein gene expression. Significant increases in plasma zinc concentrations were observed over time in the groups supplemented with zinc alone or combined with ALA (P=.007 and P=.009, respectively). An impact of zinc treatment on zinc transporter gene expression was found; ZnT5 was positively correlated with Zip3 mRNA (Pzinc, while zinc supplementation abolished the relationship between ZnT5 and Zip10. IL-6 predicted the expression levels and CRP predicted the fold change of the ZnT5, ZnT7, Zip1, Zip7 and Zip10 mRNA cluster (Pzinc transporter and metallothionein gene expression support an interrelationship between zinc homeostasis and inflammation in type 2 DM.

  14. Gene expression correlation analysis predicts involvement of high- and low-confidence risk genes in different stages of prostate carcinogenesis.

    Science.gov (United States)

    Yano, Kojiro

    2010-12-01

    Whole genome association studies have identified many loci associated with the risk of prostate cancer (PC). However, very few of the genes associated with these loci have been related to specific processes of prostate carcinogenesis. Therefore I inferred biological functions associated with these risk genes using gene expression correlation analysis. PC risk genes reported in the literature were classified as having high (Plow (Phigh-confidence genes and other genes in the microarray dataset, whereas correlation between low-confidence genes and other genes in PC showed smaller decrease. Genes involved in developmental processes were significantly correlated with all risk gene categories. Ectoderm development genes, which may be related to squamous metaplasia, and genes enriched in fetal prostate stem cells (PSCs) showed strong association with the high-confidence genes. The association between the PSC genes and the low-confidence genes was weak, but genes related to neural system genes showed strong association with low-confidence genes. The high-confidence risk genes may be associated with an early stage of prostate carcinogenesis, possibly involving PSCs and squamous metaplasia. The low-confidence genes may be involved in a later stage of carcinogenesis. © 2010 Wiley-Liss, Inc.

  15. Empathy, target distress, and neurohormone genes interact to predict aggression for others-even without provocation.

    Science.gov (United States)

    Buffone, Anneke E K; Poulin, Michael J

    2014-11-01

    Can empathy for others motivate aggression on their behalf? This research examined potential predictors of empathy-linked aggression including the emotional state of empathy, an empathy target's distress state, and the function of the social anxiety-modulating neuropeptides oxytocin and vasopressin. In Study 1 (N = 69), self-reported empathy combined with threat to a close other and individual differences in genes for the vasopressin receptor (AVPR1a rs3) and oxytocin receptor (OXTR rs53576) to predict self-reported aggression against a person who threatened a close other. In Study 2 (N = 162), induced empathy for a person combined with OXTR variation or with that person's distress and AVPR1a variation led to increased amount of hot sauce assigned to that person's competitor. Empathy uniquely predicts aggression and may do so by way of aspects of the human caregiving system in the form of oxytocin and vasopressin.

  16. Predicted Highly Expressed Genes in the Genomes of Streptomyces Coelicolor and Streptomyces Avermitilis and the Implications for their Metabolism.

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Gang; Culley, David E.; Zhang, Weiwen

    2005-06-01

    SUMMARY-Highly expressed genes in bacteria often have a stronger codon bias than genes expressed at lower levels. In this study, a comparative analysis of predicted highly expressed (PHX) genes in the Streptomyces coelicolor and S. avermitilis genomes was performed using the codon adaptation index (CAI) as a numerical estimator of gene expression level. Although it has been suggested that there is little heterogeneity in codon usage in G+C rich bacteria, considerable heterogeneity was found among genes in two G+C rich Streptomyces genomes. Using ribosomal protein (RP) genes as references, ~10% of the genes were predicted to be PHX genes using a CAI cutoff value of greater than 0.78 and 0.75 in S. coelicolor and S. avermitilis, respectively. Most of the PHX genes were found to be located within the conserved cores of the Streptomyces linear chromosomes. The predicted PHX genes showed good agreement with the experimental data on expression levels collected by proteomic analysis (Hesketh et al., 2002). Among all PHX genes, 368 were conserved in both genomes. These represented most of the genes essential for cell growth, including those involved in protein and DNA biosynthesis, amino acid metabolism, central intermediary and energy metabolisms. Only a few genes directly involved in biosynthesis of secondary metabolites were predicted to be PHX genes. Correspondence analysis showed that the genes responsible for biosynthesis of secondary metabolites possessed different codon usage patterns from RP genes, suggesting that they were either under strong translational selection that may have driven the codon preference in another direction, or they were acquired by horizontal transfer during their origin and evolution. Nevertheless, several key genes responsible for producing precursors for secondary metabolites, such as crotonyl-CoA reductase and propionyl-CoA carboxylase, and genes necessary for initiation of secondary metabolism, such as adenosylmethionine synthetase were

  17. Bayesian state space models for inferring and predicting temporal gene expression profiles.

    Science.gov (United States)

    Liang, Yulan; Kelemen, Arpad

    2007-12-01

    Prediction of gene dynamic behavior is a challenging and important problem in genomic research while estimating the temporal correlations and non-stationarity are the keys in this process. Unfortunately, most existing techniques used for the inclusion of the temporal correlations treat the time course as evenly distributed time intervals and use stationary models with time-invariant settings. This is an assumption that is often violated in microarray time course data since the time course expression data are at unequal time points, where the difference in sampling times varies from minutes to days. Furthermore, the unevenly spaced short time courses with sudden changes make the prediction of genetic dynamics difficult. In this paper, we develop two types of Bayesian state space models to tackle this challenge for inferring and predicting the gene expression profiles associated with diseases. In the univariate time-varying Bayesian state space models we treat both the stochastic transition matrix and the observation matrix time-variant with linear setting and point out that this can easily be extended to nonlinear setting. In the multivariate Bayesian state space model we include temporal correlation structures in the covariance matrix estimations. In both models, the unevenly spaced short time courses with unseen time points are treated as hidden state variables. Bayesian approaches with various prior and hyper-prior models with MCMC algorithms are used to estimate the model parameters and hidden variables. We apply our models to multiple tissue polygenetic affymetrix data sets. Results show that the predictions of the genomic dynamic behavior can be well captured by the proposed models. (c) 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

  18. RFMirTarget: predicting human microRNA target genes with a random forest classifier.

    Directory of Open Access Journals (Sweden)

    Mariana R Mendoza

    Full Text Available MicroRNAs are key regulators of eukaryotic gene expression whose fundamental role has already been identified in many cell pathways. The correct identification of miRNAs targets is still a major challenge in bioinformatics and has motivated the development of several computational methods to overcome inherent limitations of experimental analysis. Indeed, the best results reported so far in terms of specificity and sensitivity are associated to machine learning-based methods for microRNA-target prediction. Following this trend, in the current paper we discuss and explore a microRNA-target prediction method based on a random forest classifier, namely RFMirTarget. Despite its well-known robustness regarding general classifying tasks, to the best of our knowledge, random forest have not been deeply explored for the specific context of predicting microRNAs targets. Our framework first analyzes alignments between candidate microRNA-target pairs and extracts a set of structural, thermodynamics, alignment, seed and position-based features, upon which classification is performed. Experiments have shown that RFMirTarget outperforms several well-known classifiers with statistical significance, and that its performance is not impaired by the class imbalance problem or features correlation. Moreover, comparing it against other algorithms for microRNA target prediction using independent test data sets from TarBase and starBase, we observe a very promising performance, with higher sensitivity in relation to other methods. Finally, tests performed with RFMirTarget show the benefits of feature selection even for a classifier with embedded feature importance analysis, and the consistency between relevant features identified and important biological properties for effective microRNA-target gene alignment.

  19. Gene expression profiles predictive of outcome and age in infant acute lymphoblastic leukemia: A Children's Oncology Group study

    NARCIS (Netherlands)

    H. Kang; C.S. Wilson (Carla); R. Harvey (R.); I.-M. Chen (I.-Ming); M.H. Murphy (Maurice); S.R. Atlas (Susan); E.J. Bedrick (Edward); M. Devidas (Meenakshi); A.J. Carroll; B.W. Robinson (Blaine); R.W. Stam (Ronald); M.G. Valsecchi (Maria Grazia); R. Pieters (Rob); N.A. Heerema (Nyla); J.M. Hilden (Joanne); C.A. Felix (Carolyn); G.H. Reaman (Gregory); B. Camitta (Bruce); N.J. Winick (Naomi); W.L. Carroll (William); S.D. Dreyer; S.P. Hunger (Stephen); S.F. Willman (Sami )

    2012-01-01

    textabstractGene expression profiling was performed on 97 cases of infant ALL from Children's Oncology Group Trial P9407. Statistical modeling of an outcome predictor revealed 3 genes highly predictive of event-free survival (EFS), beyond age and MLL status: FLT3, IRX2, and TACC2. Low FLT3 expressio

  20. Immunohistochemical NF1 analysis does not predict NF1 gene mutation status in pheochromocytoma.

    Science.gov (United States)

    Stenman, Adam; Svahn, Fredrika; Welander, Jenny; Gustavson, Boel; Söderkvist, Peter; Gimm, Oliver; Juhlin, C Christofer

    2015-03-01

    Pheochromocytomas (PCCs) are tumors originating from the adrenal medulla displaying a diverse genetic background. While most PCCs are sporadic, about 40 % of the tumors have been associated with constitutional mutations in one of at least 14 known susceptibility genes. As 25 % of sporadic PCCs harbor somatic neurofibromin 1 gene (NF1) mutations, NF1 has been established as the most recurrently mutated gene in PCCs. To be able to pinpoint NF1-related pheochromocytoma (PCC) disease in clinical practice could facilitate the detection of familial cases, but the large size of the NF1 gene makes standard DNA sequencing methods cumbersome. The aim of this study was to examine whether mutations in the NF1 gene could be predicted by immunohistochemistry as a method to identify cases for further genetic characterization. Sixty-seven PCCs obtained from 67 unselected patients for which the somatic and constitutional mutational status of NF1 was known (49 NF1 wild type, 18 NF1 mutated) were investigated for NF1 protein immunoreactivity, and the results were correlated to clinical and genetic data. NF1 immunoreactivity was absent in the majority of the PCCs (44/67; 66 %), including 13 out of 18 cases (72 %) with a somatic or constitutional NF1 mutation. However, only a minority of the NF1 wild-type PCCs (18/49; 37 %) displayed retained NF1 immunoreactivity, thereby diminishing the specificity of the method. We conclude that NF1 immunohistochemistry alone is not a sufficient method to distinguish between NF1-mutated and non-mutated PCCs. In the clinical context, genetic screening therefore remains the most reliable tool to detect NF1-mutated PCCs.

  1. Chronic and Acute Stress, Gender, and Serotonin Transporter Gene-Environment Interactions Predicting Depression Symptoms in Youth

    Science.gov (United States)

    Hammen, Constance; Brennan, Patricia A.; Keenan-Miller, Danielle; Hazel, Nicholas A.; Najman, Jake M.

    2010-01-01

    Background: Many recent studies of serotonin transporter gene by environment effects predicting depression have used stress assessments with undefined or poor psychometric methods, possibly contributing to wide variation in findings. The present study attempted to distinguish between effects of acute and chronic stress to predict depressive…

  2. Prediction of Associations between microRNAs and Gene Expression in Glioma Biology.

    Directory of Open Access Journals (Sweden)

    Stefan Wuchty

    Full Text Available Despite progress in the determination of miR interactions, their regulatory role in cancer is only beginning to be unraveled. Utilizing gene expression data from 27 glioblastoma samples we found that the mere knowledge of physical interactions between specific mRNAs and miRs can be used to determine associated regulatory interactions, allowing us to identify 626 associated interactions, involving 128 miRs that putatively modulate the expression of 246 mRNAs. Experimentally determining the expression of miRs, we found an over-representation of over(under-expressed miRs with various predicted mRNA target sequences. Such significantly associated miRs that putatively bind over-expressed genes strongly tend to have binding sites nearby the 3'UTR of the corresponding mRNAs, suggesting that the presence of the miRs near the translation stop site may be a factor in their regulatory ability. Our analysis predicted a significant association between miR-128 and the protein kinase WEE1, which we subsequently validated experimentally by showing that the over-expression of the naturally under-expressed miR-128 in glioma cells resulted in the inhibition of WEE1 in glioblastoma cells.

  3. Computational Prediction of MicroRNAs from Toxoplasma gondii Potentially Regulating the Hosts’ Gene Expression

    Institute of Scientific and Technical Information of China (English)

    Muserref Duygu Sacar; Caner Bagc; Jens Allmer

    2014-01-01

    MicroRNAs (miRNAs) were discovered two decades ago, yet there is still a great need for further studies elucidating their genesis and targeting in different phyla. Since experimental discovery and validation of miRNAs is difficult, computational predictions are indispensable and today most computational approaches employ machine learning. Toxoplasma gondii, a parasite residing within the cells of its hosts like human, uses miRNAs for its post-transcriptional gene reg-ulation. It may also regulate its hosts’ gene expression, which has been shown in brain cancer. Since previous studies have shown that overexpressed miRNAs within the host are causal for disease onset, we hypothesized that T. gondii could export miRNAs into its host cell. We computationally predicted all hairpins from the genome of T. gondii and used mouse and human models to filter possible candidates. These were then further compared to known miRNAs in human and rodents and their expression was examined for T. gondii grown in mouse and human hosts, respectively. We found that among the millions of potential hairpins in T. gondii, only a few thousand pass filtering using a human or mouse model and that even fewer of those are expressed. Since they are expressed and differentially expressed in rodents and human, we suggest that there is a chance that T. gondii may export miRNAs into its hosts for direct regulation.

  4. Computational prediction of microRNAs from Toxoplasma gondii potentially regulating the hosts' gene expression.

    Science.gov (United States)

    Saçar, Müşerref Duygu; Bağcı, Caner; Allmer, Jens

    2014-10-01

    MicroRNAs (miRNAs) were discovered two decades ago, yet there is still a great need for further studies elucidating their genesis and targeting in different phyla. Since experimental discovery and validation of miRNAs is difficult, computational predictions are indispensable and today most computational approaches employ machine learning. Toxoplasma gondii, a parasite residing within the cells of its hosts like human, uses miRNAs for its post-transcriptional gene regulation. It may also regulate its hosts' gene expression, which has been shown in brain cancer. Since previous studies have shown that overexpressed miRNAs within the host are causal for disease onset, we hypothesized that T. gondii could export miRNAs into its host cell. We computationally predicted all hairpins from the genome of T. gondii and used mouse and human models to filter possible candidates. These were then further compared to known miRNAs in human and rodents and their expression was examined for T. gondii grown in mouse and human hosts, respectively. We found that among the millions of potential hairpins in T. gondii, only a few thousand pass filtering using a human or mouse model and that even fewer of those are expressed. Since they are expressed and differentially expressed in rodents and human, we suggest that there is a chance that T. gondii may export miRNAs into its hosts for direct regulation.

  5. Prediction of optimal gene functions for osteosarcoma using network-based- guilt by association method based on gene oncology and microarray profile.

    Science.gov (United States)

    Chen, Xinrang

    2017-06-01

    In the current study, we planned to predict the optimal gene functions for osteosarcoma (OS) by integrating network-based method with guilt by association (GBA) principle (called as network-based gene function inference approach) based on gene oncology (GO) data and gene expression profile. To begin with, differentially expressed genes (DEGs) were extracted using linear models for microarray data (LIMMA) package. Then, construction of differential co-expression network (DCN) relying on DEGs was implemented, and sub-DCN was identified using Spearman correlation coefficient (SCC). Subsequently, GO annotations for OS were collected according to known confirmed database and DEGs. Ultimately, gene functions were predicted by means of GBA principle based on the area under the curve (AUC) for GO terms, and we determined GO terms with AUC >0.7 as the optimal gene functions for OS. Totally, 123 DEGs and 137 GO terms were obtained for further analysis. A DCN was constructed, which included 123 DEGs and 7503 interactions. A total of 105 GO terms were identified when the threshold was set as AUC >0.5, which had a good classification performance. Among these 105 GO terms, 2 functions had the AUC >0.7 and were determined as the optimal gene functions including angiogenesis (AUC =0.767) and regulation of immune system process (AUC =0.710). These gene functions appear to have potential for early detection and clinical treatment of OS in the future.

  6. Can survival prediction be improved by merging gene expression data sets?

    Directory of Open Access Journals (Sweden)

    Haleh Yasrebi

    Full Text Available BACKGROUND: High-throughput gene expression profiling technologies generating a wealth of data, are increasingly used for characterization of tumor biopsies for clinical trials. By applying machine learning algorithms to such clinically documented data sets, one hopes to improve tumor diagnosis, prognosis, as well as prediction of treatment response. However, the limited number of patients enrolled in a single trial study limits the power of machine learning approaches due to over-fitting. One could partially overcome this limitation by merging data from different studies. Nevertheless, such data sets differ from each other with regard to technical biases, patient selection criteria and follow-up treatment. It is therefore not clear at all whether the advantage of increased sample size outweighs the disadvantage of higher heterogeneity of merged data sets. Here, we present a systematic study to answer this question specifically for breast cancer data sets. We use survival prediction based on Cox regression as an assay to measure the added value of merged data sets. RESULTS: Using time-dependent Receiver Operating Characteristic-Area Under the Curve (ROC-AUC and hazard ratio as performance measures, we see in overall no significant improvement or deterioration of survival prediction with merged data sets as compared to individual data sets. This apparently was due to the fact that a few genes with strong prognostic power were not available on all microarray platforms and thus were not retained in the merged data sets. Surprisingly, we found that the overall best performance was achieved with a single-gene predictor consisting of CYB5D1. CONCLUSIONS: Merging did not deteriorate performance on average despite (a The diversity of microarray platforms used. (b The heterogeneity of patients cohorts. (c The heterogeneity of breast cancer disease. (d Substantial variation of time to death or relapse. (e The reduced number of genes in the merged data

  7. Prediction of metabolic flux distribution from gene expression data based on the flux minimization principle.

    Directory of Open Access Journals (Sweden)

    Hyun-Seob Song

    Full Text Available Prediction of possible flux distributions in a metabolic network provides detailed phenotypic information that links metabolism to cellular physiology. To estimate metabolic steady-state fluxes, the most common approach is to solve a set of macroscopic mass balance equations subjected to stoichiometric constraints while attempting to optimize an assumed optimal objective function. This assumption is justifiable in specific cases but may be invalid when tested across different conditions, cell populations, or other organisms. With an aim to providing a more consistent and reliable prediction of flux distributions over a wide range of conditions, in this article we propose a framework that uses the flux minimization principle to predict active metabolic pathways from mRNA expression data. The proposed algorithm minimizes a weighted sum of flux magnitudes, while biomass production can be bounded to fit an ample range from very low to very high values according to the analyzed context. We have formulated the flux weights as a function of the corresponding enzyme reaction's gene expression value, enabling the creation of context-specific fluxes based on a generic metabolic network. In case studies of wild-type Saccharomyces cerevisiae, and wild-type and mutant Escherichia coli strains, our method achieved high prediction accuracy, as gauged by correlation coefficients and sums of squared error, with respect to the experimentally measured values. In contrast to other approaches, our method was able to provide quantitative predictions for both model organisms under a variety of conditions. Our approach requires no prior knowledge or assumption of a context-specific metabolic functionality and does not require trial-and-error parameter adjustments. Thus, our framework is of general applicability for modeling the transcription-dependent metabolism of bacteria and yeasts.

  8. A gene expression signature that can predict the recurrence of tamoxifen-treated primary breast cancer.

    Science.gov (United States)

    Chanrion, Maïa; Negre, Vincent; Fontaine, Hélène; Salvetat, Nicolas; Bibeau, Frédéric; Mac Grogan, Gaëtan; Mauriac, Louis; Katsaros, Dionyssios; Molina, Franck; Theillet, Charles; Darbon, Jean-Marie

    2008-03-15

    The identification of a molecular signature predicting the relapse of tamoxifen-treated primary breast cancers should help the therapeutic management of estrogen receptor-positive cancers. A series of 132 primary tumors from patients who received adjuvant tamoxifen were analyzed for expression profiles at the whole-genome level by 70-mer oligonucleotide microarrays. A supervised analysis was done to identify an expression signature. We defined a 36-gene signature that correctly classified 78% of patients with relapse and 80% of relapse-free patients (79% accuracy). Using 23 independent tumors, we confirmed the accuracy of the signature (78%) whose relevance was further shown by using published microarray data from 60 tamoxifen-treated patients (63% accuracy). Univariate analysis using the validation set of 83 tumors showed that the 36-gene classifier is more efficient in predicting disease-free survival than the traditional histopathologic prognostic factors and is as effective as the Nottingham Prognostic Index or the "Adjuvant!" software. Multivariate analysis showed that the molecular signature is the only independent prognostic factor. A comparison with several already published signatures demonstrated that the 36-gene signature is among the best to classify tumors from both training and validation sets. Kaplan-Meier analyses emphasized its prognostic power both on the whole cohort of patients and on a subgroup with an intermediate risk of recurrence as defined by the St. Gallen criteria. This study identifies a molecular signature specifying a subgroup of patients who do not gain benefits from tamoxifen treatment. These patients may therefore be eligible for alternative endocrine therapies and/or chemotherapy.

  9. The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction

    Directory of Open Access Journals (Sweden)

    Garzón-Martínez Gina A

    2012-04-01

    Full Text Available Abstract Background Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. Results We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs, using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI’s BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato and Solanum tuberosum (potato. We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. Conclusions We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the

  10. Predicting miRNA Targets by Integrating Gene Regulatory Knowledge with Expression Profiles.

    Directory of Open Access Journals (Sweden)

    Weijia Zhang

    Full Text Available microRNAs (miRNAs play crucial roles in post-transcriptional gene regulation of both plants and mammals, and dysfunctions of miRNAs are often associated with tumorigenesis and development through the effects on their target messenger RNAs (mRNAs. Identifying miRNA functions is critical for understanding cancer mechanisms and determining the efficacy of drugs. Computational methods analyzing high-throughput data offer great assistance in understanding the diverse and complex relationships between miRNAs and mRNAs. However, most of the existing methods do not fully utilise the available knowledge in biology to reduce the uncertainty in the modeling process. Therefore it is desirable to develop a method that can seamlessly integrate existing biological knowledge and high-throughput data into the process of discovering miRNA regulation mechanisms.In this article we present an integrative framework, CIDER (Causal miRNA target Discovery with Expression profile and Regulatory knowledge, to predict miRNA targets. CIDER is able to utilise a variety of gene regulation knowledge, including transcriptional and post-transcriptional knowledge, and to exploit gene expression data for the discovery of miRNA-mRNA regulatory relationships. The benefits of our framework is demonstrated by both simulation study and the analysis of the epithelial-to-mesenchymal transition (EMT and the breast cancer (BRCA datasets. Our results reveal that even a limited amount of either Transcription Factor (TF-miRNA or miRNA-mRNA regulatory knowledge improves the performance of miRNA target prediction, and the combination of the two types of knowledge enhances the improvement further. Another useful property of the framework is that its performance increases monotonically with the increase of regulatory knowledge.

  11. Bioinformatic Prediction of Gene Functions Regulated by Quorum Sensing in the Bioleaching Bacterium Acidithiobacillus ferrooxidans

    Directory of Open Access Journals (Sweden)

    Alvaro Banderas

    2013-08-01

    Full Text Available The biomining bacterium Acidithiobacillus ferrooxidans oxidizes sulfide ores and promotes metal solubilization. The efficiency of this process depends on the attachment of cells to surfaces, a process regulated by quorum sensing (QS cell-to-cell signalling in many Gram-negative bacteria. At. ferrooxidans has a functional QS system and the presence of AHLs enhances its attachment to pyrite. However, direct targets of the QS transcription factor AfeR remain unknown. In this study, a bioinformatic approach was used to infer possible AfeR direct targets based on the particular palindromic features of the AfeR binding site. A set of Hidden Markov Models designed to maintain palindromic regions and vary non-palindromic regions was used to screen for putative binding sites. By annotating the context of each predicted binding site (PBS, we classified them according to their positional coherence relative to other putative genomic structures such as start codons, RNA polymerase promoter elements and intergenic regions. We further used the Multiple EM for Motif Elicitation algorithm (MEME to further filter out low homology PBSs. In summary, 75 target-genes were identified, 34 of which have a higher confidence level. Among the identified genes, we found afeR itself, zwf, genes encoding glycosyltransferase activities, metallo-beta lactamases, and active transport-related proteins. Glycosyltransferases and Zwf (Glucose 6-phosphate-1-dehydrogenase might be directly involved in polysaccharide biosynthesis and attachment to minerals by At. ferrooxidans cells during the bioleaching process.

  12. Melanopsin Gene Variations Interact With Season to Predict Sleep Onset and Chronotype

    Science.gov (United States)

    Roecklein, Kathryn A.; Wong, Patricia M.; Franzen, Peter L.; Hasler, Brant P.; Wood-Vasey, W. Michael; Nimgaonkar, Vishwajit L.; Miller, Megan A.; Kepreos, Kyle M.; Ferrell, Robert E.; Manuck, Stephen B.

    2013-01-01

    The human melanopsin gene has been reported to mediate risk for seasonal affective disorder (SAD), which is hypothesized to be caused by decreased photic input during winter when light levels fall below threshold, resulting in differences in circadian phase and/or sleep. However, it is unclear if melanopsin increases risk of SAD by causing differences in sleep or circadian phase, or if those differences are symptoms of the mood disorder. To determine if melanopsin sequence variations are associated with differences in sleep-wake behavior among those not suffering from a mood disorder, the authors tested associations between melanopsin gene polymorphisms and self-reported sleep timing (sleep onset and wake time) in a community sample (N = 234) of non-Hispanic Caucasian participants (age 30–54 yrs) with no history of psychological, neurological, or sleep disorders. The authors also tested the effect of melanopsin variations on differences in preferred sleep and activity timing (i.e., chronotype), which may reflect differences in circadian phase, sleep homeostasis, or both. Daylength on the day of assessment was measured and included in analyses. DNA samples were genotyped for melanopsin gene polymorphisms using fluorescence polarization. P10L genotype interacted with daylength to predict self-reported sleep onset (interaction p seasonal patterns of recurrence or exacerbation. PMID:22881342

  13. Gastric microbiota and predicted gene functions are altered after subtotal gastrectomy in patients with gastric cancer.

    Science.gov (United States)

    Tseng, Ching-Hung; Lin, Jaw-Town; Ho, Hsiu J; Lai, Zi-Lun; Wang, Chang-Bi; Tang, Sen-Lin; Wu, Chun-Ying

    2016-02-10

    Subtotal gastrectomy (i.e., partial removal of the stomach), a surgical treatment for early-stage distal gastric cancer, is usually accompanied by highly selective vagotomy and Billroth II reconstruction, leading to dramatic changes in the gastric environment. Based on accumulating evidence of a strong link between human gut microbiota and host health, a 2-year follow-up study was conducted to characterize the effects of subtotal gastrectomy. Gastric microbiota and predicted gene functions inferred from 16S rRNA gene sequencing were analyzed before and after surgery. The results demonstrated that gastric microbiota is significantly more diverse after surgery. Ralstonia and Helicobacter were the top two genera of discriminant abundance in the cancerous stomach before surgery, while Streptococcus and Prevotella were the two most abundant genera after tumor excision. Furthermore, N-nitrosation genes were prevalent before surgery, whereas bile salt hydrolase, NO and N2O reductase were prevalent afterward. To our knowledge, this is the first report to document changes in gastric microbiota before and after surgical treatment of stomach cancer.

  14. Response-predictive gene expression profiling of glioma progenitor cells in vitro.

    Directory of Open Access Journals (Sweden)

    Sylvia Moeckel

    Full Text Available BACKGROUND: High-grade gliomas are amongst the most deadly human tumors. Treatment results are disappointing. Still, in several trials around 20% of patients respond to therapy. To date, diagnostic strategies to identify patients that will profit from a specific therapy do not exist. METHODS: In this study, we used serum-free short-term treated in vitro cell cultures to predict treatment response in vitro. This approach allowed us (a to enrich specimens for brain tumor initiating cells and (b to confront cells with a therapeutic agent before expression profiling. RESULTS: As a proof of principle we analyzed gene expression in 18 short-term serum-free cultures of high-grade gliomas enhanced for brain tumor initiating cells (BTIC before and after in vitro treatment with the tyrosine kinase inhibitor Sunitinib. Profiles from treated progenitor cells allowed to predict therapy-induced impairment of proliferation in vitro. CONCLUSION: For the tyrosine kinase inhibitor Sunitinib used in this dataset, the approach revealed additional predictive information in comparison to the evaluation of classical signaling analysis.

  15. Gene expression signatures that predict radiation exposure in mice and humans.

    Directory of Open Access Journals (Sweden)

    Holly K Dressman

    2007-04-01

    Full Text Available BACKGROUND: The capacity to assess environmental inputs to biological phenotypes is limited by methods that can accurately and quantitatively measure these contributions. One such example can be seen in the context of exposure to ionizing radiation. METHODS AND FINDINGS: We have made use of gene expression analysis of peripheral blood (PB mononuclear cells to develop expression profiles that accurately reflect prior radiation exposure. We demonstrate that expression profiles can be developed that not only predict radiation exposure in mice but also distinguish the level of radiation exposure, ranging from 50 cGy to 1,000 cGy. Likewise, a molecular signature of radiation response developed solely from irradiated human patient samples can predict and distinguish irradiated human PB samples from nonirradiated samples with an accuracy of 90%, sensitivity of 85%, and specificity of 94%. We further demonstrate that a radiation profile developed in the mouse can correctly distinguish PB samples from irradiated and nonirradiated human patients with an accuracy of 77%, sensitivity of 82%, and specificity of 75%. Taken together, these data demonstrate that molecular profiles can be generated that are highly predictive of different levels of radiation exposure in mice and humans. CONCLUSIONS: We suggest that this approach, with additional refinement, could provide a method to assess the effects of various environmental inputs into biological phenotypes as well as providing a more practical application of a rapid molecular screening test for the diagnosis of radiation exposure.

  16. Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data

    CERN Document Server

    Kastrin, Andrej

    2010-01-01

    Class prediction is an important application of microarray gene expression data analysis. The high-dimensionality of microarray data, where number of genes (variables) is very large compared to the number of samples (obser- vations), makes the application of many prediction techniques (e.g., logistic regression, discriminant analysis) difficult. An efficient way to solve this prob- lem is by using dimension reduction statistical techniques. Increasingly used in psychology-related applications, Rasch model (RM) provides an appealing framework for handling high-dimensional microarray data. In this paper, we study the potential of RM-based modeling in dimensionality reduction with binarized microarray gene expression data and investigate its prediction ac- curacy in the context of class prediction using linear discriminant analysis. Two different publicly available microarray data sets are used to illustrate a general framework of the approach. Performance of the proposed method is assessed by re-randomization s...

  17. Prediction of the prognosis of breast cancer in routine histologic specimens using a simplified, low-cost gene expression signature

    DEFF Research Database (Denmark)

    Marcell, S.A.; Balazs, A.; Emese, A.;

    2013-01-01

    Prediction of the prognosis of breast cancer in routine histologic specimens using a simplified, low-cost gene expression signature Background: Grade 2 breast carcinomas do not form a uniform prognostic group. Aim: To extend the number of patients and the investigated genes of a previously...... identified prognostic signature described by the authors that reflect chromosomal instability in order to refine characterization of grade 2 breast cancers and identify driver genes. Methods: Using publicly available databases, the authors selected 9 target and 3 housekeeping genes that are capable to divide...... prognosis groups. Centroid-based ranking showed that 3 genes, FOXM1, TOP2A and CLDN4 were able to separate the good and poor prognostic groups of grade 2 breast carcinomas. Conclusion: Using appropriately selected control genes, a limited set of genes is able to split prognostic groups of breast carcinomas...

  18. Mining predicted essential genes of Brugia malayi for nematode drug targets.

    Directory of Open Access Journals (Sweden)

    Sanjay Kumar

    Full Text Available We report results from the first genome-wide application of a rational drug target selection methodology to a metazoan pathogen genome, the completed draft sequence of Brugia malayi, a parasitic nematode responsible for human lymphatic filariasis. More than 1.5 billion people worldwide are at risk of contracting lymphatic filariasis and onchocerciasis, a related filarial disease. Drug treatments for filariasis have not changed significantly in over 20 years, and with the risk of resistance rising, there is an urgent need for the development of new anti-filarial drug therapies. The recent publication of the draft genomic sequence for B. malayi enables a genome-wide search for new drug targets. However, there is no functional genomics data in B. malayi to guide the selection of potential drug targets. To circumvent this problem, we have utilized the free-living model nematode Caenorhabditis elegans as a surrogate for B. malayi. Sequence comparisons between the two genomes allow us to map C. elegans orthologs to B. malayi genes. Using these orthology mappings and by incorporating the extensive genomic and functional genomic data, including genome-wide RNAi screens, that already exist for C. elegans, we identify potentially essential genes in B. malayi. Further incorporation of human host genome sequence data and a custom algorithm for prioritization enables us to collect and rank nearly 600 drug target candidates. Previously identified potential drug targets cluster near the top of our prioritized list, lending credibility to our methodology. Over-represented Gene Ontology terms, predicted InterPro domains, and RNAi phenotypes of C. elegans orthologs associated with the potential target pool are identified. By virtue of the selection procedure, the potential B. malayi drug targets highlight components of key processes in nematode biology such as central metabolism, molting and regulation of gene expression.

  19. Polymorphism at the Clock gene predicts phenology of long-distance migration in birds.

    Science.gov (United States)

    Saino, Nicola; Bazzi, Gaia; Gatti, Emanuele; Caprioli, Manuela; Cecere, Jacopo G; Possenti, Cristina D; Galimberti, Andrea; Orioli, Valerio; Bani, Luciano; Rubolini, Diego; Gianfranceschi, Luca; Spina, Fernando

    2015-04-01

    Dissecting phenotypic variance in life history traits into its genetic and environmental components is at the focus of evolutionary studies and of pivotal importance to identify the mechanisms and predict the consequences of human-driven environmental change. The timing of recurrent life history events (phenology) is under strong selection, but the study of the genes that control potential environmental canalization in phenological traits is at its infancy. Candidate genes for circadian behaviour entrained by photoperiod have been screened as potential controllers of phenological variation of breeding and moult in birds, with inconsistent results. Despite photoperiodic control of migration is well established, no study has reported on migration phenology in relation to polymorphism at candidate genes in birds. We analysed variation in spring migration dates within four trans-Saharan migratory species (Luscinia megarhynchos; Ficedula hypoleuca; Anthus trivialis; Saxicola rubetra) at a Mediterranean island in relation to Clock and Adcyap1 polymorphism. Individuals with larger number of glutamine residues in the poly-Q region of Clock gene migrated significantly later in one or, respectively, two species depending on sex and whether the within-individual mean length or the length of the longer Clock allele was considered. The results hinted at dominance of the longer Clock allele. No significant evidence for migration date to covary with Adcyap1 polymorphism emerged. This is the first evidence that migration phenology is associated with Clock in birds. This finding is important for evolutionary studies of migration and sheds light on the mechanisms that drive bird phenological changes and population trends in response to climate change.

  20. Tissue-based microarray expression of genes predictive of metastasis in uveal melanoma and differentially expressed in metastatic uveal melanoma.

    Science.gov (United States)

    Demirci, Hakan; Reed, David; Elner, Victor M

    2013-10-01

    To screen the microarray expression of CDH1, ECM1, EIF1B, FXR1, HTR2B, ID2, LMCD1, LTA4H, MTUS1, RAB31, ROBO1, and SATB1 genes which are predictive of primary uveal melanoma metastasis, and NFKB2, PTPN18, MTSS1, GADD45B, SNCG, HHIP, IL12B, CDK4, RPLP0, RPS17, RPS12 genes that are differentially expressed in metastatic uveal melanoma in normal whole human blood and tissues prone to metastatic involvement by uveal melanoma. We screened the GeneNote and GNF BioGPS databases for microarray analysis of genes predictive of primary uveal melanoma metastasis and those differentially expressed in metastatic uveal melanoma in normal whole blood, liver, lung and skin. Microarray analysis showed expression of all 22 genes in normal whole blood, liver, lung and skin, which are the most common sites of metastases. In the GNF BioGPS database, data for expression of the HHIP gene in normal whole blood and skin was not complete. Microarray analysis of genes predicting systemic metastasis of uveal melanoma and genes differentially expressed in metastatic uveal melanoma may not be used as a biomarker for metastasis in whole blood, liver, lung, and skin. Their expression in tissues prone to metastasis may suggest that they play a role in tropism of uveal melanoma metastasis to these tissues.

  1. Tissue-Based Microarray Expression of Genes Predictive of Metastasis in Uveal Melanoma and Differentially Expressed in Metastatic Uveal Melanoma

    Directory of Open Access Journals (Sweden)

    Hakan Demirci

    2013-01-01

    Full Text Available Purpose: To screen the microarray expression of CDH1, ECM1, EIF1B, FXR1, HTR2B, ID2, LMCD1, LTA4H, MTUS1, RAB31, ROBO1, and SATB1 genes which are predictive of primary uveal melanoma metastasis, and NFKB2, PTPN18, MTSS1, GADD45B, SNCG, HHIP, IL12B, CDK4, RPLP0, RPS17, RPS12 genes that are differentially expressed in metastatic uveal melanoma in normal whole human blood and tissues prone to metastatic involvement by uveal melanoma. Methods: We screened the GeneNote and GNF BioGPS databases for microarray analysis of genes predictive of primary uveal melanoma metastasis and those differentially expressed in metastatic uveal melanoma in normal whole blood, liver, lung and skin. Results: Microarray analysis showed expression of all 22 genes in normal whole blood, liver, lung and skin, which are the most common sites of metastases. In the GNF BioGPS database, data for expression of the HHIP gene in normal whole blood and skin was not complete. Conclusions: Microarray analysis of genes predicting systemic metastasis of uveal melanoma and genes differentially expressed in metastatic uveal melanoma may not be used as a biomarker for metastasis in whole blood, liver, lung, and skin. Their expression in tissues prone to metastasis may suggest that they play a role in tropism of uveal melanoma metastasis to these tissues.

  2. Prediction of effective RNA interference targets and pathway-related genes in lepidopteran insects by RNA sequencing analysis.

    Science.gov (United States)

    Guan, Ruo-Bing; Li, Hai-Chao; Miao, Xue-Xia

    2017-01-06

    When using RNAi to study gene functions in Lepidoptera insects, we discovered that some genes could not be suppressed, instead, their expression levels could be up-regulated by dsRNA. To predict which genes could be easily silenced, we treated the Asian corn borer (Ostrinia furnacalis) with dsGFP and dsMLP. A transcriptome sequence analysis was conducted using the cDNAs 6 h after treatment with dsRNA. The results indicated that 160 genes were up-regulated and 44 genes were down-regulated by the two dsRNAs. Then, 50 co-up-regulated, 25 co-down-regulated and 43 unaffected genes were selected to determine their RNAi responses. All the 25 down-regulated genes were knocked down by their corresponding dsRNA. However, several of the up-regulated and unaffected genes were up-regulated when treated with their corresponding dsRNAs instead of being knocked-down. The genes up-regulated by the dsGFP treatment may be involved in insect immune responses or the RNAi pathway. When the immune-related genes were excluded, only seven genes were induced by dsGFP, including ago-2 and dicer-2. These results not only provide a reference for efficient RNAi targets predication, but also provide some potential RNAi pathway-related genes for further study. This article is protected by copyright. All rights reserved.

  3. Defining the cutoff value of MGMT gene promoter methylation and its predictive capacity in glioblastoma.

    Science.gov (United States)

    Brigliadori, Giovanni; Foca, Flavia; Dall'Agata, Monia; Rengucci, Claudia; Melegari, Elisabetta; Cerasoli, Serenella; Amadori, Dino; Calistri, Daniele; Faedi, Marina

    2016-06-01

    Despite advances in the treatment of glioblastoma (GBM), median survival is 12-15 months. O6-methylguanine-DNA methyltransferase (MGMT) gene promoter methylation status is acknowledged as a predictive marker for temozolomide (TMZ) treatment. When MGMT promoter values fall into a "methylated" range, a better response to chemotherapy is expected. However, a cutoff that discriminates between "methylated" and "unmethylated" status has yet to be defined. We aimed to identify the best cutoff value and to find out whether variability in methylation profiles influences the predictive capacity of MGMT promoter methylation. Data from 105 GBM patients treated between 2008 and 2013 were analyzed. MGMT promoter methylation status was determined by analyzing 10 CpG islands by pyrosequencing. Patients were treated with radiotherapy followed by TMZ. MGMT promoter methylation status was classified into unmethylated 0-9 %, methylated 10-29 % and methylated 30-100 %. Statistical analysis showed that an assumed methylation cutoff of 9 % led to an overestimation of responders. All patients in the 10-29 % methylation group relapsed before the 18-month evaluation. Patients with a methylation status ≥30 % showed a median overall survival of 25.2 months compared to 15.2 months in all other patients, confirming this value as the best methylation cutoff. Despite wide variability among individual profiles, single CpG island analysis did not reveal any correlation between single CpG island methylation values and relapse or death. Specific CpG island methylation status did not influence the predictive value of MGMT. The predictive role of MGMT promoter methylation was maintained only with a cutoff value ≥30 %.

  4. A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury.

    Science.gov (United States)

    Kohonen, Pekka; Parkkinen, Juuso A; Willighagen, Egon L; Ceder, Rebecca; Wennerberg, Krister; Kaski, Samuel; Grafström, Roland C

    2017-07-03

    Predicting unanticipated harmful effects of chemicals and drug molecules is a difficult and costly task. Here we utilize a 'big data compacting and data fusion'-concept to capture diverse adverse outcomes on cellular and organismal levels. The approach generates from transcriptomics data set a 'predictive toxicogenomics space' (PTGS) tool composed of 1,331 genes distributed over 14 overlapping cytotoxicity-related gene space components. Involving ∼2.5 × 10(8) data points and 1,300 compounds to construct and validate the PTGS, the tool serves to: explain dose-dependent cytotoxicity effects, provide a virtual cytotoxicity probability estimate intrinsic to omics data, predict chemically-induced pathological states in liver resulting from repeated dosing of rats, and furthermore, predict human drug-induced liver injury (DILI) from hepatocyte experiments. Analysing 68 DILI-annotated drugs, the PTGS tool outperforms and complements existing tests, leading to a hereto-unseen level of DILI prediction accuracy.

  5. Applicability of a gene expression based prediction method to SD and Wistar rats: an example of CARCINOscreen®.

    Science.gov (United States)

    Matsumoto, Hiroshi; Saito, Fumiyo; Takeyoshi, Masahiro

    2015-12-01

    Recently, the development of several gene expression-based prediction methods has been attempted in the fields of toxicology. CARCINOscreen® is a gene expression-based screening method to predict carcinogenicity of chemicals which target the liver with high accuracy. In this study, we investigated the applicability of the gene expression-based screening method to SD and Wistar rats by using CARCINOscreen®, originally developed with F344 rats, with two carcinogens, 2,4-diaminotoluen and thioacetamide, and two non-carcinogens, 2,6-diaminotoluen and sodium benzoate. After the 28-day repeated dose test was conducted with each chemical in SD and Wistar rats, microarray analysis was performed using total RNA extracted from each liver. Obtained gene expression data were applied to CARCINOscreen®. Predictive scores obtained by the CARCINOscreen® for known carcinogens were > 2 in all strains of rats, while non-carcinogens gave prediction scores below 0.5. These results suggested that the gene expression based screening method, CARCINOscreen®, can be applied to SD and Wistar rats, widely used strains in toxicological studies, by setting of an appropriate boundary line of prediction score to classify the chemicals into carcinogens and non-carcinogens.

  6. Improved gene prediction by principal component analysis based autoregressive Yule-Walker method.

    Science.gov (United States)

    Roy, Manidipa; Barman, Soma

    2016-01-10

    Spectral analysis using Fourier techniques is popular with gene prediction because of its simplicity. Model-based autoregressive (AR) spectral estimation gives better resolution even for small DNA segments but selection of appropriate model order is a critical issue. In this article a technique has been proposed where Yule-Walker autoregressive (YW-AR) process is combined with principal component analysis (PCA) for reduction in dimensionality. The spectral peaks of DNA signal are used to detect protein-coding regions based on the 1/3 frequency component. Here optimal model order selection is no more critical as noise is removed by PCA prior to power spectral density (PSD) estimation. Eigenvalue-ratio is used to find the threshold between signal and noise subspaces for data reduction. Superiority of proposed method over fast Fourier Transform (FFT) method and autoregressive method combined with wavelet packet transform (WPT) is established with the help of receiver operating characteristics (ROC) and discrimination measure (DM) respectively.

  7. Short communication: genetic variability in the predicted microRNA target sites of caprine casein genes.

    Science.gov (United States)

    Zidi, A; Amills, M; Tomás, A; Vidal, O; Ramírez, O; Carrizosa, J; Urrutia, B; Serradilla, J M; Clop, A

    2010-04-01

    The main goal of the current work was to identify single nucleotide polymorphisms (SNP) that might create or disrupt microRNA (miRNA) target sites in the caprine casein genes. The 3' untranslated regions of the goat alpha(S1)-, alpha(S2)-, beta-, and kappa-casein genes (CSN1S1, CSN1S2, CSN2, and CSN3, respectively) were resequenced in 25 individuals of the Murciano-Granadina, Cashmere, Canarian, Saanen, and Sahelian breeds. Five SNP were identified through this strategy: c.175C>T at CSN1S1; c.109T>C, c.139G>C, and c.160T>C at CSN1S2; and c.216C>T at CSN2. Analysis with the Patrocles Finder tool predicted that all of these SNP are located within regions complementary to the seed of diverse miRNA sequences. These in silico results suggest that polymorphism at miRNA target sites might have some effect on casein expression. We explored this issue by genotyping the c.175C>T SNP (CSN1S1) in 85 Murciano-Granadina goats with records for milk CSN1S1 concentrations. This substitution destroys a putative target site for miR-101, a miRNA known to be expressed in the bovine mammary gland. Although TT goats had higher levels (6.25 g/L) of CSN1S1 than their CT (6.05 g/L) and CC (6.04 g/L) counterparts, these differences were not significant. Experimental confirmation of the miRNA target sites predicted in the current work and performance of additional association analyses in other goat populations will be an essential step to find out if polymorphic miRNA target sites constitute an important source of variation in casein expression.

  8. A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.

    Directory of Open Access Journals (Sweden)

    Meysam Bastani

    Full Text Available BACKGROUND: Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. METHODS: To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. RESULTS: This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. CONCLUSIONS: Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions.

  9. A Machine Learned Classifier That Uses Gene Expression Data to Accurately Predict Estrogen Receptor Status

    Science.gov (United States)

    Bastani, Meysam; Vos, Larissa; Asgarian, Nasimeh; Deschenes, Jean; Graham, Kathryn; Mackey, John; Greiner, Russell

    2013-01-01

    Background Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. Methods To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. Results This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. Conclusions Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions. PMID:24312637

  10. Polymorphisms in genes involved in EGFR-turnover are predictive for cetuximab efficacy in colorectal cancer

    Science.gov (United States)

    Stintzing, Sebastian; Zhang, Wu; Heinemann, Volker; Neureiter, Daniel; Kemmerling, Ralf; Kirchner, Thomas; Jung, Andreas; Folwaczny, Matthias; Yang, Dongyun; Ning, Yan; Sebio, Ana; Stremitzer, Stefan; Sunakawa, Yu; Matsusaka, Satoshi; Yamauchi, Shinichi; Loupakis, Fotios; Cremolini, Chiara; Falcone, Alfredo; Lenz, Heinz-Josef

    2015-01-01

    Transmembrane receptors such as the epidermal growth factor receptor (EGFR) are regulated by their turnover, which is dependent on the ubiquitin-proteasome-system (UPS). We tested in two independent study cohorts whether single nucleotide polymorphisms (SNPs) in genes involved in EGFR turnover predict clinical outcome in cetuximab treated metastatic colorectal cancer patients. The following SNPs involved in EGFR degradation were analyzed in a screening cohort of 108 patients treated with cetuximab in the chemorefractory setting: c-CBL (rs7105971; rs4938637; rs4938638; rs251837), EPS15 (rs17567; rs7308; rs1065754), NAE1 (rs363169; rs363170; rs363172); SH3KBP1 (rs7051590; rs5955820; rs1017874; rs11795873); SGIP1 (rs604737; rs6570808; rs7526812); UBE2M (rs895364; rs895374); UBE2L3 (rs5754216). SNPs showing an association with response or survival were analyzed in BRAF and RAS wild-type samples from the FIRE-3 study. 153 FOLFIRI plus cetuximab treated patients served as validation set, 168 patients of the FOLFIRI plus bevacizumab arm served as controls. EGFR FISH was done in 138 samples to test whether significant SNPs were associated with EGFR expression. UBE2M rs895374 was significantly associated with PFS (logrank-p = 0.005; HR 0.60) within cetuximab treated patients. No association with bevacizumab treated patients (n=168) could be established (p= 0.56, HR: 0.90). rs895374 genotype did not affect EGFR FISH measurements. EGFR recycling is an interesting mechanism of secondary resistance to cetuximab in mCRC. This is the first report suggesting that germline polymorphisms in the degradation process predict efficacy of cetuximab in patients with mCRC. Genes involved in EGFR turnover may be new targets in the treatment of mCRC. PMID:26206335

  11. An Eighteen-Gene Classifier Predicts Locoregional Recurrence in Post-Mastectomy Breast Cancer Patients

    Directory of Open Access Journals (Sweden)

    Skye H. Cheng

    2016-03-01

    Full Text Available We previously identified 34 genes of interest (GOI in 2006 to aid the oncologists to determine whether post-mastectomy radiotherapy (PMRT is indicated for certain patients with breast cancer. At this time, an independent cohort of 135 patients having DNA microarray study available from the primary tumor tissue samples was chosen. Inclusion criteria were 1 mastectomy as the first treatment, 2 pathology stages I-III, 3 any locoregional recurrence (LRR and 4 no PMRT. After inter-platform data integration of Affymetrix U95 and U133 Plus 2.0 arrays and quantile normalization, in this paper we used 18 of 34 GOI to divide the mastectomy patients into high and low risk groups. The 5-year rate of freedom from LRR in the high-risk group was 30%. In contrast, in the low-risk group it was 99% (p<0.0001. Multivariate analysis revealed that the 18-gene classifier independently predicts rates of LRR regardless of nodal status or cancer subtype.

  12. Xenobiotic metabolizing enzyme gene polymorphisms predict response to lung volume reduction surgery

    Directory of Open Access Journals (Sweden)

    DeMeo Dawn L

    2007-08-01

    Full Text Available Abstract Background In the National Emphysema Treatment Trial (NETT, marked variability in response to lung volume reduction surgery (LVRS was observed. We sought to identify genetic differences which may explain some of this variability. Methods In 203 subjects from the NETT Genetics Ancillary Study, four outcome measures were used to define response to LVRS at six months: modified BODE index, post-bronchodilator FEV1, maximum work achieved on a cardiopulmonary exercise test, and University of California, San Diego shortness of breath questionnaire. Sixty-four single nucleotide polymorphisms (SNPs were genotyped in five genes previously shown to be associated with chronic obstructive pulmonary disease susceptibility, exercise capacity, or emphysema distribution. Results A SNP upstream from glutathione S-transferase pi (GSTP1; p = 0.003 and a coding SNP in microsomal epoxide hydrolase (EPHX1; p = 0.02 were each associated with change in BODE score. These effects appeared to be strongest in patients in the non-upper lobe predominant, low exercise subgroup. A promoter SNP in EPHX1 was associated with change in BODE score (p = 0.008, with the strongest effects in patients with upper lobe predominant emphysema and low exercise capacity. One additional SNP in GSTP1 and three additional SNPs in EPHX1 were associated (p Conclusion Genetic variants in GSTP1 and EPHX1, two genes encoding xenobiotic metabolizing enzymes, were predictive of response to LVRS. These polymorphisms may identify patients most likely to benefit from LVRS.

  13. Predicting childhood effortful control from interactions between early parenting quality and children's dopamine transporter gene haplotypes.

    Science.gov (United States)

    Li, Yi; Sulik, Michael J; Eisenberg, Nancy; Spinrad, Tracy L; Lemery-Chalfant, Kathryn; Stover, Daryn A; Verrelli, Brian C

    2016-02-01

    Children's observed effortful control (EC) at 30, 42, and 54 months (n = 145) was predicted from the interaction between mothers' observed parenting with their 30-month-olds and three variants of the solute carrier family C6, member 3 (SLC6A3) dopamine transporter gene (single nucleotide polymorphisms in intron8 and intron13, and a 40 base pair variable number tandem repeat [VNTR] in the 3'-untranslated region [UTR]), as well as haplotypes of these variants. Significant moderating effects were found. Children without the intron8-A/intron13-G, intron8-A/3'-UTR VNTR-10, or intron13-G/3'-UTR VNTR-10 haplotypes (i.e., haplotypes associated with the reduced SLC6A3 gene expression and thus lower dopamine functioning) appeared to demonstrate altered levels of EC as a function of maternal parenting quality, whereas children with these haplotypes demonstrated a similar EC level regardless of the parenting quality. Children with these haplotypes demonstrated a trade-off, such that they showed higher EC, relative to their counterparts without these haplotypes, when exposed to less supportive maternal parenting. The findings revealed a diathesis-stress pattern and suggested that different SLC6A3 haplotypes, but not single variants, might represent different levels of young children's sensitivity/responsivity to early parenting.

  14. Multiple genetic interaction experiments provide complementary information useful for gene function prediction.

    Directory of Open Access Journals (Sweden)

    Magali Michaut

    Full Text Available Genetic interactions help map biological processes and their functional relationships. A genetic interaction is defined as a deviation from the expected phenotype when combining multiple genetic mutations. In Saccharomyces cerevisiae, most genetic interactions are measured under a single phenotype - growth rate in standard laboratory conditions. Recently genetic interactions have been collected under different phenotypic readouts and experimental conditions. How different are these networks and what can we learn from their differences? We conducted a systematic analysis of quantitative genetic interaction networks in yeast performed under different experimental conditions. We find that networks obtained using different phenotypic readouts, in different conditions and from different laboratories overlap less than expected and provide significant unique information. To exploit this information, we develop a novel method to combine individual genetic interaction data sets and show that the resulting network improves gene function prediction performance, demonstrating that individual networks provide complementary information. Our results support the notion that using diverse phenotypic readouts and experimental conditions will substantially increase the amount of gene function information produced by genetic interaction screens.

  15. Interactions of adolescent social experiences and dopamine genes to predict physical intimate partner violence perpetration

    Science.gov (United States)

    Parker, Edith A.; Peek-Asa, Corinne

    2017-01-01

    Objectives We examined the interactions between three dopamine gene alleles (DAT1, DRD2, DRD4) previously associated with violent behavior and two components of the adolescent environment (exposure to violence, school social environment) to predict adulthood physical intimate partner violence (IPV) perpetration among white men and women. Methods We used data from Wave IV of the National Longitudinal Study of Adolescent to Adult Health, a cohort study following individuals from adolescence to adulthood. Based on the prior literature, we categorized participants as at risk for each of the three dopamine genes using this coding scheme: two 10-R alleles for DAT1; at least one A-1 allele for DRD2; at least one 7-R or 8-R allele for DRD4. Adolescent exposure to violence and school social environment was measured in 1994 and 1995 when participants were in high school or middle school. Intimate partner violence perpetration was measured in 2008 when participants were 24 to 32 years old. We used simple and multivariable logistic regression models, including interactions of genes and the adolescent environments for the analysis. Results Presence of risk alleles was not independently associated with IPV perpetration but increasing exposure to violence and disconnection from the school social environment was associated with physical IPV perpetration. The effects of these adolescent experiences on physical IPV perpetration varied by dopamine risk allele status. Among individuals with non-risk dopamine alleles, increased exposure to violence during adolescence and perception of disconnection from the school environment were significantly associated with increased odds of physical IPV perpetration, but individuals with high risk alleles, overall, did not experience the same increase. Conclusion Our results suggested the effects of adolescent environment on adulthood physical IPV perpetration varied by genetic factors. This analysis did not find a direct link between risk alleles

  16. Prediction of Enhancement Effect of Nitroimidazoles on Irradiation by Gene Expression Programming

    Institute of Scientific and Technical Information of China (English)

    LONG Wei; ZHANG Xiao-dong; WANG Hao; SHEN Xiu; SI Hong-zong; FAN Sai-jun; ZHOU Ze-wei

    2013-01-01

    A novel machine learning method,gene expression programming(GEP),was employed to build quatitative structure-activity relationship(QSAR) models for predicting the enhancement effect of nitroimidazole compounds on irradiation.The models were based on descriptors which were calculated from the molecular structures.Four descriptors were selected from the pool of descriptors by best multiple linear regression(BMLR) method.After that,three regression methods,multiple linear regression(MLR),support vector machine(SVM) and GEP,were used to build QSAR models.Compared to MLR and SVM,GEP produced a better model with the square of correlation coefficient(R2),0.9203 and 0.9014,and the root mean square error(RMSE),0.6187 and 0.6875,for training set and test set,respectively.The results show that the GEP model has better predictive ability and more reliable than the MLR and SVM models.This indicates that GEP is a promising method on relevant researches in radiation area.

  17. Melanopsin gene variations interact with season to predict sleep onset and chronotype.

    Science.gov (United States)

    Roecklein, Kathryn A; Wong, Patricia M; Franzen, Peter L; Hasler, Brant P; Wood-Vasey, W Michael; Nimgaonkar, Vishwajit L; Miller, Megan A; Kepreos, Kyle M; Ferrell, Robert E; Manuck, Stephen B

    2012-10-01

    The human melanopsin gene has been reported to mediate risk for seasonal affective disorder (SAD), which is hypothesized to be caused by decreased photic input during winter when light levels fall below threshold, resulting in differences in circadian phase and/or sleep. However, it is unclear if melanopsin increases risk of SAD by causing differences in sleep or circadian phase, or if those differences are symptoms of the mood disorder. To determine if melanopsin sequence variations are associated with differences in sleep-wake behavior among those not suffering from a mood disorder, the authors tested associations between melanopsin gene polymorphisms and self-reported sleep timing (sleep onset and wake time) in a community sample (N = 234) of non-Hispanic Caucasian participants (age 30-54 yrs) with no history of psychological, neurological, or sleep disorders. The authors also tested the effect of melanopsin variations on differences in preferred sleep and activity timing (i.e., chronotype), which may reflect differences in circadian phase, sleep homeostasis, or both. Daylength on the day of assessment was measured and included in analyses. DNA samples were genotyped for melanopsin gene polymorphisms using fluorescence polarization. P10L genotype interacted with daylength to predict self-reported sleep onset (interaction p sleep onset among those with the TT genotype was later in the day when individuals were assessed on longer days and earlier in the day on shorter days, whereas individuals in the other genotype groups (i.e., CC and CT) did not show this interaction effect. P10L genotype also interacted in an analogous way with daylength to predict self-reported morningness (interaction p sleep onset and chronotype as a function of daylength, whereas other genotypes at P10L do not seem to have effects that vary by daylength. A better understanding of how melanopsin confers heightened responsivity to daylength may improve our understanding of a broad range of

  18. Peripheral neuropathy predicts nuclear gene defect in patients with mitochondrial ophthalmoplegia.

    Science.gov (United States)

    Horga, Alejandro; Pitceathly, Robert D S; Blake, Julian C; Woodward, Catherine E; Zapater, Pedro; Fratter, Carl; Mudanohwo, Ese E; Plant, Gordon T; Houlden, Henry; Sweeney, Mary G; Hanna, Michael G; Reilly, Mary M

    2014-12-01

    Progressive external ophthalmoplegia is a common clinical feature in mitochondrial disease caused by nuclear DNA defects and single, large-scale mitochondrial DNA deletions and is less frequently associated with point mutations of mitochondrial DNA. Peripheral neuropathy is also a frequent manifestation of mitochondrial disease, although its prevalence and characteristics varies considerably among the different syndromes and genetic aetiologies. Based on clinical observations, we systematically investigated whether the presence of peripheral neuropathy could predict the underlying genetic defect in patients with progressive external ophthalmoplegia. We analysed detailed demographic, clinical and neurophysiological data from 116 patients with genetically-defined mitochondrial disease and progressive external ophthalmoplegia. Seventy-eight patients (67%) had a single mitochondrial DNA deletion, 12 (10%) had a point mutation of mitochondrial DNA and 26 (22%) had mutations in either POLG, C10orf2 or RRM2B, or had multiple mitochondrial DNA deletions in muscle without an identified nuclear gene defect. Seventy-seven patients had neurophysiological studies; of these, 16 patients (21%) had a large-fibre peripheral neuropathy. The prevalence of peripheral neuropathy was significantly lower in patients with a single mitochondrial DNA deletion (2%) as compared to those with a point mutation of mitochondrial DNA or with a nuclear DNA defect (44% and 52%, respectively; Pneuropathy as the only independent predictor associated with a nuclear DNA defect (P=0.002; odds ratio 8.43, 95% confidence interval 2.24-31.76). Multinomial logistic regression analysis identified peripheral neuropathy, family history and hearing loss as significant predictors of the genotype, and the same three variables showed the highest performance in genotype classification in a decision tree analysis. Of these variables, peripheral neuropathy had the highest specificity (91%), negative predictive value

  19. Gene Expression Versus Sequence for Predicting Function:Glia Maturation Factor Gamma Is Not A Glia Maturation Factor

    Institute of Scientific and Technical Information of China (English)

    MichaelG.Walker

    2003-01-01

    It is standard practice,whenever a researcher finds a new gene,to search databases for genes that have a similar sequence.It is not standard practice,whenever a researcher finds a new gene,to search for genes that have similar expression(coexpression).Failure to perform co-expression searches has lead to incorrect conclusions about the likely function of new genes,and has lead to wasted laboratory attempts to confirm functions incorrectly predicted.We present here the example of Glia Maturation Factor gamma(GMF-gamma).Despite its name,it has not been shown to participate in glia maturation.It is a gene of unknown function that is similar in sequence to GMF-beta.The sequence homology and chromosomal location led to an unsuccessful searchfor GMF-gamma mutations in glioma.We examined GMF-gamma expression in 1432 human cDNA libraries.Highest expression occurs in phagocytic,antigen-presenting and other hematopoietic cells.We found GMF-gamma mRNA in almost every tissue examined,with expression in nervous tissue no higher than in any other tissue.Our evidence indicates that GMF-gamma participates in phagocytosis in antigen presenting cells.Searches for genes with similar sequences should be supplemented with searches for genes with similar expression to avoid incorrect predictions.

  20. Gene Expression Versus Sequence for Predicting Function: Glia Maturation Factor Gamma Is Not A Glia Maturation Factor

    Institute of Scientific and Technical Information of China (English)

    Michael G. Walker

    2003-01-01

    It is standard practice, whenever a researcher finds a new gene, to search databases for genes that have a similar sequence. It is not standard practice, whenever a researcher finds a new gene, to search for genes that have similar expression (coexpression). Failure to perform co-expression searches has lead to incorrect conclusions about the likely function of new genes, and has lead to wasted laboratory attempts to confirm functions incorrectly predicted. We present here the example of Glia Maturation Factor gamma (GMF-gamma). Despite its name, it has not been shown to participate in glia maturation. It is a gene of unknown function that is similar in sequence to GMF-beta. The sequence homology and chromosomal location led to an unsuccessful search for GMF-gamma mutations in glioma.We examined GMF-gamma expression in 1432 human cDNA libraries. Highest expression occurs in phagocytic, antigen-presenting and other hematopoietic cells.We found GMF-gamma mRNA in almost every tissue examined, with expression in nervous tissue no higher than in any other tissue. Our evidence indicates that GMF-gamma participates in phagocytosis in antigen presenting cells. Searches for genes with similar sequences should be supplemented with searches for genes with similar expression to avoid incorrect predictions.

  1. High accordance in prognosis prediction of colorectal cancer across independent datasets by multi-gene module expression profiles.

    Directory of Open Access Journals (Sweden)

    Wenting Li

    Full Text Available A considerable portion of patients with colorectal cancer have a high risk of disease recurrence after surgery. These patients can be identified by analyzing the expression profiles of signature genes in tumors. But there is no consensus on which genes should be used and the performance of specific set of signature genes varies greatly with different datasets, impeding their implementation in the routine clinical application. Instead of using individual genes, here we identified functional multi-gene modules with significant expression changes between recurrent and recurrence-free tumors, used them as the signatures for predicting colorectal cancer recurrence in multiple datasets that were collected independently and profiled on different microarray platforms. The multi-gene modules we identified have a significant enrichment of known genes and biological processes relevant to cancer development, including genes from the chemokine pathway. Most strikingly, they recruited a significant enrichment of somatic mutations found in colorectal cancer. These results confirmed the functional relevance of these modules for colorectal cancer development. Further, these functional modules from different datasets overlapped significantly. Finally, we demonstrated that, leveraging above information of these modules, our module based classifier avoided arbitrary fitting the classifier function and screening the signatures using the training data, and achieved more consistency in prognosis prediction across three independent datasets, which holds even using very small training sets of tumors.

  2. CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations.

    Science.gov (United States)

    Park, Julie; Costanzo, Maria C; Balakrishnan, Rama; Cherry, J Michael; Hong, Eurie L

    2012-01-01

    The set of annotations at the Saccharomyces Genome Database (SGD) that classifies the cellular function of S. cerevisiae gene products using Gene Ontology (GO) terms has become an important resource for facilitating experimental analysis. In addition to capturing and summarizing experimental results, the structured nature of GO annotations allows for functional comparison across organisms as well as propagation of functional predictions between related gene products. Due to their relevance to many areas of research, ensuring the accuracy and quality of these annotations is a priority at SGD. GO annotations are assigned either manually, by biocurators extracting experimental evidence from the scientific literature, or through automated methods that leverage computational algorithms to predict functional information. Here, we discuss the relationship between literature-based and computationally predicted GO annotations in SGD and extend a strategy whereby comparison of these two types of annotation identifies genes whose annotations need review. Our method, CvManGO (Computational versus Manual GO annotations), pairs literature-based GO annotations with computational GO predictions and evaluates the relationship of the two terms within GO, looking for instances of discrepancy. We found that this method will identify genes that require annotation updates, taking an important step towards finding ways to prioritize literature review. Additionally, we explored factors that may influence the effectiveness of CvManGO in identifying relevant gene targets to find in particular those genes that are missing literature-supported annotations, but our survey found that there are no immediately identifiable criteria by which one could enrich for these under-annotated genes. Finally, we discuss possible ways to improve this strategy, and the applicability of this method to other projects that use the GO for curation. DATABASE URL: http://www.yeastgenome.org.

  3. Gene expression markers in circulating tumor cells may predict bone metastasis and response to hormonal treatment in breast cancer.

    Science.gov (United States)

    Wang, Haiying; Molina, Julian; Jiang, John; Ferber, Matthew; Pruthi, Sandhya; Jatkoe, Timothy; Derecho, Carlo; Rajpurohit, Yashoda; Zheng, Jian; Wang, Yixin

    2013-11-01

    Circulating tumor cells (CTCs) have recently attracted attention due to their potential as prognostic and predictive markers for the clinical management of metastatic breast cancer patients. The isolation of CTCs from patients may enable the molecular characterization of these cells, which may help establish a minimally invasive assay for the prediction of metastasis and further optimization of treatment. Molecular markers of proven clinical value may therefore be useful in predicting disease aggressiveness and response to treatment. In our earlier study, we identified a gene signature in breast cancer that appears to be significantly associated with bone metastasis. Among the genes that constitute this signature, trefoil factor 1 (TFF1) was identified as the most differentially expressed gene associated with bone metastasis. In this study, we investigated 25 candidate gene markers in the CTCs of metastatic breast cancer patients with different metastatic sites. The panel of the 25 markers was investigated in 80 baseline samples (first blood draw of CTCs) and 30 follow-up samples. In addition, 40 healthy blood donors (HBDs) were analyzed as controls. The assay was performed using quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) with RNA extracted from CTCs captured by the CellSearch system. Our study indicated that 12 of the genes were uniquely expressed in CTCs and 10 were highly expressed in the CTCs obtained from patients compared to those obtained from HBDs. Among these genes, the expression of keratin 19 was highly correlated with the CTC count. The TFF1 expression in CTCs was a strong predictor of bone metastasis and the patients with a high expression of estrogen receptor β in CTCs exhibited a better response to hormonal treatment. Molecular characterization of these genes in CTCs may provide a better understanding of the mechanism underlying tumor metastasis and identify gene markers in CTCs for predicting disease progression and

  4. Expression of tumor necrosis factor-alpha-mediated genes predicts recurrence-free survival in lung cancer.

    Science.gov (United States)

    Wang, Baohua; Song, Ning; Yu, Tong; Zhou, Lianya; Zhang, Helin; Duan, Lin; He, Wenshu; Zhu, Yihua; Bai, Yunfei; Zhu, Miao

    2014-01-01

    In this study, we conducted a meta-analysis on high-throughput gene expression data to identify TNF-α-mediated genes implicated in lung cancer. We first investigated the gene expression profiles of two independent TNF-α/TNFR KO murine models. The EGF receptor signaling pathway was the top pathway associated with genes mediated by TNF-α. After matching the TNF-α-mediated mouse genes to their human orthologs, we compared the expression patterns of the TNF-α-mediated genes in normal and tumor lung tissues obtained from humans. Based on the TNF-α-mediated genes that were dysregulated in lung tumors, we developed a prognostic gene signature that effectively predicted recurrence-free survival in lung cancer in two validation cohorts. Resampling tests suggested that the prognostic power of the gene signature was not by chance, and multivariate analysis suggested that this gene signature was independent of the traditional clinical factors and enhanced the identification of lung cancer patients at greater risk for recurrence.

  5. Expression of tumor necrosis factor-alpha-mediated genes predicts recurrence-free survival in lung cancer.

    Directory of Open Access Journals (Sweden)

    Baohua Wang

    Full Text Available In this study, we conducted a meta-analysis on high-throughput gene expression data to identify TNF-α-mediated genes implicated in lung cancer. We first investigated the gene expression profiles of two independent TNF-α/TNFR KO murine models. The EGF receptor signaling pathway was the top pathway associated with genes mediated by TNF-α. After matching the TNF-α-mediated mouse genes to their human orthologs, we compared the expression patterns of the TNF-α-mediated genes in normal and tumor lung tissues obtained from humans. Based on the TNF-α-mediated genes that were dysregulated in lung tumors, we developed a prognostic gene signature that effectively predicted recurrence-free survival in lung cancer in two validation cohorts. Resampling tests suggested that the prognostic power of the gene signature was not by chance, and multivariate analysis suggested that this gene signature was independent of the traditional clinical factors and enhanced the identification of lung cancer patients at greater risk for recurrence.

  6. Gene-expression signature predicts postoperative recurrence in stage I non-small cell lung cancer patients.

    Science.gov (United States)

    Lu, Yan; Wang, Liang; Liu, Pengyuan; Yang, Ping; You, Ming

    2012-01-01

    About 30% stage I non-small cell lung cancer (NSCLC) patients undergoing resection will recur. Robust prognostic markers are required to better manage therapy options. The purpose of this study is to develop and validate a novel gene-expression signature that can predict tumor recurrence of stage I NSCLC patients. Cox proportional hazards regression analysis was performed to identify recurrence-related genes and a partial Cox regression model was used to generate a gene signature of recurrence in the training dataset -142 stage I lung adenocarcinomas without adjunctive therapy from the Director's Challenge Consortium. Four independent validation datasets, including GSE5843, GSE8894, and two other datasets provided by Mayo Clinic and Washington University, were used to assess the prediction accuracy by calculating the correlation between risk score estimated from gene expression and real recurrence-free survival time and AUC of time-dependent ROC analysis. Pathway-based survival analyses were also performed. 104 probesets correlated with recurrence in the training dataset. They are enriched in cell adhesion, apoptosis and regulation of cell proliferation. A 51-gene expression signature was identified to distinguish patients likely to develop tumor recurrence (Dxy = -0.83, P85%. Multiple pathways including leukocyte transendothelial migration and cell adhesion were highly correlated with recurrence-free survival. The gene signature is highly predictive of recurrence in stage I NSCLC patients, which has important prognostic and therapeutic implications for the future management of these patients.

  7. Gene-expression signature predicts postoperative recurrence in stage I non-small cell lung cancer patients.

    Directory of Open Access Journals (Sweden)

    Yan Lu

    Full Text Available About 30% stage I non-small cell lung cancer (NSCLC patients undergoing resection will recur. Robust prognostic markers are required to better manage therapy options. The purpose of this study is to develop and validate a novel gene-expression signature that can predict tumor recurrence of stage I NSCLC patients. Cox proportional hazards regression analysis was performed to identify recurrence-related genes and a partial Cox regression model was used to generate a gene signature of recurrence in the training dataset -142 stage I lung adenocarcinomas without adjunctive therapy from the Director's Challenge Consortium. Four independent validation datasets, including GSE5843, GSE8894, and two other datasets provided by Mayo Clinic and Washington University, were used to assess the prediction accuracy by calculating the correlation between risk score estimated from gene expression and real recurrence-free survival time and AUC of time-dependent ROC analysis. Pathway-based survival analyses were also performed. 104 probesets correlated with recurrence in the training dataset. They are enriched in cell adhesion, apoptosis and regulation of cell proliferation. A 51-gene expression signature was identified to distinguish patients likely to develop tumor recurrence (Dxy = -0.83, P85%. Multiple pathways including leukocyte transendothelial migration and cell adhesion were highly correlated with recurrence-free survival. The gene signature is highly predictive of recurrence in stage I NSCLC patients, which has important prognostic and therapeutic implications for the future management of these patients.

  8. Landscape genetics as a tool for conservation planning: predicting the effects of landscape change on gene flow.

    Science.gov (United States)

    van Strien, Maarten J; Keller, Daniela; Holderegger, Rolf; Ghazoul, Jaboury; Kienast, Felix; Bolliger, Janine

    2014-03-01

    For conservation managers, it is important to know whether landscape changes lead to increasing or decreasing gene flow. Although the discipline of landscape genetics assesses the influence of landscape elements on gene flow, no studies have yet used landscape-genetic models to predict gene flow resulting from landscape change. A species that has already been severely affected by landscape change is the large marsh grasshopper (Stethophyma grossum), which inhabits moist areas in fragmented agricultural landscapes in Switzerland. From transects drawn between all population pairs within maximum dispersal distance (landscape planning.

  9. Sex-linkage of sexually antagonistic genes is predicted by female, but not male, effects in birds.

    Science.gov (United States)

    Mank, Judith E; Ellegren, Hans

    2009-06-01

    Evolutionary theory predicts that sexually antagonistic loci will be preferentially sex-linked, and this association can be empirically testes with data on sex-biased gene expression with the assumption that sex-biased gene expression represents the resolution of past sexual antagonism. However, incomplete dosage compensating mechanisms and meiotic sex chromosome inactivation have hampered efforts to connect expression data to theoretical predictions regarding the genomic distribution of sexually antagonistic loci in a variety of animals. Here we use data on the underlying regulatory mechanism that produce expression sex-bias to test the genomic distribution of sexually antagonistic genes in chicken. Using this approach, which is free from problems associated with the lack of dosage compensation in birds, we show that female-detriment genes are significantly overrepresented on the Z chromosome, and female-benefit genes underrepresented. By contrast, male-effect genes show no over- or underrepresentation on the Z chromosome. These data are consistent with a dominant mode of inheritance for sexually antagonistic genes, in which male-benefit coding mutations are more likely to be fixed on the Z due to stronger male-specific selective pressures. After fixation of male-benefit alleles, regulatory changes in females evolve to minimize antagonism by reducing female expression.

  10. In Silico Analysis of Microarray-Based Gene Expression Profiles Predicts Tumor Cell Response to Withanolides

    Directory of Open Access Journals (Sweden)

    Thomas Efferth

    2012-05-01

    Full Text Available Withania somnifera (L. Dunal (Indian ginseng, winter cherry, Solanaceae is widely used in traditional medicine. Roots are either chewed or used to prepare beverages (aqueous decocts. The major secondary metabolites of Withania somnifera are the withanolides, which are C-28-steroidal lactone triterpenoids. Withania somnifera extracts exert chemopreventive and anticancer activities in vitro and in vivo. The aims of the present in silico study were, firstly, to investigate whether tumor cells develop cross-resistance between standard anticancer drugs and withanolides and, secondly, to elucidate the molecular determinants of sensitivity and resistance of tumor cells towards withanolides. Using IC50 concentrations of eight different withanolides (withaferin A, withaferin A diacetate, 3-azerininylwithaferin A, withafastuosin D diacetate, 4-B-hydroxy-withanolide E, isowithanololide E, withafastuosin E, and withaperuvin and 19 established anticancer drugs, we analyzed the cross-resistance profile of 60 tumor cell lines. The cell lines revealed cross-resistance between the eight withanolides. Consistent cross-resistance between withanolides and nitrosoureas (carmustin, lomustin, and semimustin was also observed. Then, we performed transcriptomic microarray-based COMPARE and hierarchical cluster analyses of mRNA expression to identify mRNA expression profiles predicting sensitivity or resistance towards withanolides. Genes from diverse functional groups were significantly associated with response of tumor cells to withaferin A diacetate, e.g. genes functioning in DNA damage and repair, stress response, cell growth regulation, extracellular matrix components, cell adhesion and cell migration, constituents of the ribosome, cytoskeletal organization and regulation, signal transduction, transcription factors, and others.

  11. Multifactorial Patterns of Gene Expression in Colonic Epithelial Cells Predict Disease Phenotypes in Experimental Colitis

    Science.gov (United States)

    Frantz, Aubrey L.; Bruno, Maria E.C.; Rogier, Eric W.; Tuna, Halide; Cohen, Donald A.; Bondada, Subbarao; Chelvarajan, R. Lakshman; Brandon, J. Anthony; Jennings, C. Darrell; Kaetzel, Charlotte S.

    2012-01-01

    Background The pathogenesis of inflammatory bowel disease (IBD) is complex and the need to identify molecular biomarkers is critical. Epithelial cells play a central role in maintaining intestinal homeostasis. We previously identified 5 “signature” biomarkers in colonic epithelial cells (CEC) that are predictive of disease phenotype in Crohn’s disease. Here we investigate the ability of CEC biomarkers to define the mechanism and severity of intestinal inflammation. Methods We analyzed expression of RelA, A20, pIgR, TNF and MIP-2 in CEC of mice with DSS acute colitis or T cell-mediated chronic colitis. Factor analysis was used to combine the 5 biomarkers into 2 multifactorial principal components (PCs). PC scores for individual mice were correlated with disease severity. Results For both colitis models, PC1 was strongly weighted toward RelA, A20 and pIgR, and PC2 was strongly weighted toward TNF and MIP-2, while the contributions of other biomarkers varied depending on the etiology of inflammation. Disease severity was correlated with elevated PC2 scores in DSS colitis and reduced PC1 scores in T cell transfer colitis. Down-regulation of pIgR was a common feature observed in both colitis models and was associated with altered cellular localization of pIgR and failure to transport IgA. Conclusions A multifactorial analysis of epithelial gene expression may be more informative than examining single gene responses in IBD. These results provide insight into the homeostatic and pro-inflammatory functions of CEC in IBD pathogenesis and suggest that biomarker analysis could be useful for evaluating therapeutic options for IBD patients. PMID:23070952

  12. A gene expression signature that can predict green tea exposure and chemopreventive efficacy of lung cancer in mice.

    Science.gov (United States)

    Lu, Yan; Yao, Ruisheng; Yan, Ying; Wang, Yian; Hara, Yukihiko; Lubet, Ronald A; You, Ming

    2006-02-15

    Green tea has been shown to be a potent chemopreventive agent against lung tumorigenesis in animal models. Previously, we found that treatment of A/J mice with either green tea (0.6% in water) or a defined green tea catechin extract (polyphenon E; 2.0 g/kg in diet) inhibited lung tumor tumorigenesis. Here, we described expression profiling of lung tissues derived from these studies to determine the gene expression signature that can predict the exposure and efficacy of green tea in mice. We first profiled global gene expressions in normal lungs versus lung tumors to determine genes which might be associated with the tumorigenic process (TUM genes). Gene expression in control tumors and green tea-treated tumors (either green tea or polyphenon E) were compared to determine those TUM genes whose expression levels in green tea-treated tumors returned to levels seen in normal lungs. We established a 17-gene expression profile specific for exposure to effective doses of either green tea or polyphenon E. This gene expression signature was altered both in normal lungs and lung adenomas when mice were exposed to green tea or polyphenon E. These experiments identified patterns of gene expressions that both offer clues for green tea's potential mechanisms of action and provide a molecular signature specific for green tea exposure.

  13. Gene-Gene-Environment Interactions of Serotonin Transporter, Monoamine Oxidase A and Childhood Maltreatment Predict Aggressive Behavior in Chinese Adolescents

    Science.gov (United States)

    Zhang, Yun; Ming, Qing-sen; Yi, Jin-yao; Wang, Xiang; Chai, Qiao-lian; Yao, Shu-qiao

    2017-01-01

    Gene-environment interactions that moderate aggressive behavior have been identified independently in the serotonin transporter (5-HTT) gene and monoamine oxidase A gene (MAOA). The aim of the present study was to investigate epistasis interactions between MAOA-variable number tandem repeat (VNTR), 5-HTTlinked polymorphism (LPR) and child abuse and the effects of these on aggressive tendencies in a group of otherwise healthy adolescents. A group of 546 Chinese male adolescents completed the Child Trauma Questionnaire and Youth self-report of the Child Behavior Checklist. Buccal cells were collected for DNA analysis. The effects of childhood abuse, MAOA-VNTR, 5-HTTLPR genotypes and their interactive gene-gene-environmental effects on aggressive behavior were analyzed using a linear regression model. The effect of child maltreatment was significant, and a three-way interaction among MAOA-VNTR, 5-HTTLPR and sexual abuse (SA) relating to aggressive behaviors was identified. Chinese male adolescents with high expression of the MAOA-VNTR allele and 5-HTTLPR “SS” genotype exhibited the highest aggression tendencies with an increase in SA during childhood. The findings reported support aggression being a complex behavior involving the synergistic effects of gene-gene-environment interactions. PMID:28203149

  14. A variant in the KCNQ1 gene predicts future type 2 diabetes and mediates impaired insulin secretion

    DEFF Research Database (Denmark)

    Jonsson, Anna Elisabet; Isomaa, Bo; Tuomi, Tiinamaija;

    2009-01-01

    Two independent genome-wide association studies for type 2 diabetes in Japanese subjects have recently identified common variants in the KCNQ1 gene that are strongly associated with type 2 diabetes. Here we studied whether a common variant in KCNQ1 would influence BMI as well as insulin secretion...... and action and predict future type 2 diabetes in subjects from Sweden and Finland....

  15. Interactions between Serotonin Transporter Gene Haplotypes and Quality of Mothers' Parenting Predict the Development of Children's Noncompliance

    Science.gov (United States)

    Sulik, Michael J.; Eisenberg, Nancy; Lemery-Chalfant, Kathryn; Spinrad, Tracy L.; Silva, Kassondra M.; Eggum, Natalie D.; Betkowski, Jennifer A.; Kupfer, Anne; Smith, Cynthia L.; Gaertner, Bridget; Stover, Daryn A.; Verrelli, Brian C.

    2012-01-01

    The LPR and STin2 polymorphisms of the serotonin transporter gene (SLC6A4) were combined into haplotypes that, together with quality of maternal parenting, were used to predict initial levels and linear change in children's (N = 138) noncompliance and aggression from age 18-54 months. Quality of mothers' parenting behavior was observed when…

  16. Arabidopsis CPR5 is a senescence-regulatory gene with pleiotropic functions as predicted by the evolutionary theory of senescence

    NARCIS (Netherlands)

    Jing, Hai-Chun; Anderson, Lisa; Sturre, Marcel J. G.; Hille, Jacques; Dijkwel, Paul P.

    2007-01-01

    Arabidopsis CPR5 is a senescence-regulatory gene with pleiotropic functions as predicted by the evolutionary theory of senescence Hai-Chun Jing1,2, Lisa Anderson3, Marcel J.G. Sturre1, Jacques Hille1 and Paul P. Dijkwel1,* 1Molecular Biology of Plants, Groningen Biomolecular Sciences and Biotechnolo

  17. Arabidopsis CPR5 is a senescence-regulatory gene with pleiotropic functions as predicted by the evolutionary theory of senescence

    NARCIS (Netherlands)

    Jing, Hai-Chun; Anderson, Lisa; Sturre, Marcel J. G.; Hille, Jacques; Dijkwel, Paul P.

    2007-01-01

    Arabidopsis CPR5 is a senescence-regulatory gene with pleiotropic functions as predicted by the evolutionary theory of senescence Hai-Chun Jing1,2, Lisa Anderson3, Marcel J.G. Sturre1, Jacques Hille1 and Paul P. Dijkwel1,* 1Molecular Biology of Plants, Groningen Biomolecular Sciences and

  18. A distinct adipose tissue gene expression response to caloric restriction predicts 6-mo weight maintenance in obese subjects

    DEFF Research Database (Denmark)

    Mutch, D. M.; Pers, Tune Hannes; Temanni, M. R.

    2011-01-01

    AT) gene expression during a low-calorie diet (LCD) could be used to differentiate and predict subjects who experience successful short-term weight maintenance from subjects who experience weight regain. Design: Forty white women followed a dietary protocol consisting of an 8-wk LCD phase followed by a 6...

  19. A distinct adipose tissue gene expression response to caloric restriction predicts 6-mo weight maintenance in obese subjects

    DEFF Research Database (Denmark)

    Mutch, D. M.; Pers, Tune Hannes; Temanni, M. R.

    2011-01-01

    fatty acid metabolism, citric acid cycle, oxidative phosphorylation, and apoptosis were regulated differently by the LCD in WM and WR subjects. Conclusion: This study suggests that LCD-induced changes in insulin secretion and scAT gene expression may have the potential to predict successful short...

  20. Gene expression signatures predict outcome in non-muscle invasive bladder carcinoma - a multi-center validation study

    DEFF Research Database (Denmark)

    Andersen, Lars Dyrskjøt; Zieger, Karsten; Real, Francisco X.

    2007-01-01

    PURPOSE: Clinically useful molecular markers predicting the clinical course of patients diagnosed with non-muscle-invasive bladder cancer are needed to improve treatment outcome. Here, we validated four previously reported gene expression signatures for molecular diagnosis of disease stage and ca...

  1. DNA methylation of the oxytocin receptor gene predicts neural response to ambiguous social stimuli

    Directory of Open Access Journals (Sweden)

    Allison eJack

    2012-10-01

    Full Text Available Oxytocin and its receptor (OXTR play an important role in a variety of social perceptual and affiliative processes. Individual variability in social information processing likely has a strong heritable component, and as such, many investigations have established an association between common genetic variants of OXTR and variability in the social phenotype. However, to date, these investigations have primarily focused only on changes in the sequence of DNA without considering the role of epigenetic factors. DNA methylation is an epigenetic mechanism by which cells control transcription through modification of chromatin structure. DNA methylation of OXTR decreases expression of the gene and high levels of methylation have been associated with autism spectrum disorders. This link between epigenetic variability and social phenotype allows for the possibility that social processes are under epigenetic control. We hypothesized that the level of DNA methylation of OXTR would predict individual variability in social perception. Using the brain’s sensitivity to displays of animacy as a neural endophenotype of social perception, we found significant associations between the degree of OXTR methylation and brain activity evoked by the perception of animacy. Our results suggest that consideration of DNA methylation may substantially improve our ability to explain individual differences in imaging genetic association studies.

  2. Predicting human miRNA target genes using a novel evolutionary methodology

    KAUST Repository

    Aigli, Korfiati

    2012-01-01

    The discovery of miRNAs had great impacts on traditional biology. Typically, miRNAs have the potential to bind to the 3\\'untraslated region (UTR) of their mRNA target genes for cleavage or translational repression. The experimental identification of their targets has many drawbacks including cost, time and low specificity and these are the reasons why many computational approaches have been developed so far. However, existing computational approaches do not include any advanced feature selection technique and they are facing problems concerning their classification performance and their interpretability. In the present paper, we propose a novel hybrid methodology which combines genetic algorithms and support vector machines in order to locate the optimal feature subset while achieving high classification performance. The proposed methodology was compared with two of the most promising existing methodologies in the problem of predicting human miRNA targets. Our approach outperforms existing methodologies in terms of classification performances while selecting a much smaller feature subset. © 2012 Springer-Verlag.

  3. Multi-gene genetic programming based predictive models for municipal solid waste gasification in a fluidized bed gasifier.

    Science.gov (United States)

    Pandey, Daya Shankar; Pan, Indranil; Das, Saptarshi; Leahy, James J; Kwapinski, Witold

    2015-03-01

    A multi-gene genetic programming technique is proposed as a new method to predict syngas yield production and the lower heating value for municipal solid waste gasification in a fluidized bed gasifier. The study shows that the predicted outputs of the municipal solid waste gasification process are in good agreement with the experimental dataset and also generalise well to validation (untrained) data. Published experimental datasets are used for model training and validation purposes. The results show the effectiveness of the genetic programming technique for solving complex nonlinear regression problems. The multi-gene genetic programming are also compared with a single-gene genetic programming model to show the relative merits and demerits of the technique. This study demonstrates that the genetic programming based data-driven modelling strategy can be a good candidate for developing models for other types of fuels as well.

  4. SVM classifier to predict genes important for self-renewal and pluripotency of mouse embryonic stem cells

    Directory of Open Access Journals (Sweden)

    Xu Huilei

    2010-12-01

    Full Text Available Abstract Background Mouse embryonic stem cells (mESCs are derived from the inner cell mass of a developing blastocyst and can be cultured indefinitely in-vitro. Their distinct features are their ability to self-renew and to differentiate to all adult cell types. Genes that maintain mESCs self-renewal and pluripotency identity are of interest to stem cell biologists. Although significant steps have been made toward the identification and characterization of such genes, the list is still incomplete and controversial. For example, the overlap among candidate self-renewal and pluripotency genes across different RNAi screens is surprisingly small. Meanwhile, machine learning approaches have been used to analyze multi-dimensional experimental data and integrate results from many studies, yet they have not been applied to specifically tackle the task of predicting and classifying self-renewal and pluripotency gene membership. Results For this study we developed a classifier, a supervised machine learning framework for predicting self-renewal and pluripotency mESCs stemness membership genes (MSMG using support vector machines (SVM. The data used to train the classifier was derived from mESCs-related studies using mRNA microarrays, measuring gene expression in various stages of early differentiation, as well as ChIP-seq studies applied to mESCs profiling genome-wide binding of key transcription factors, such as Nanog, Oct4, and Sox2, to the regulatory regions of other genes. Comparison to other classification methods using the leave-one-out cross-validation method was employed to evaluate the accuracy and generality of the classification. Finally, two sets of candidate genes from genome-wide RNA interference screens are used to test the generality and potential application of the classifier. Conclusions Our results reveal that an SVM approach can be useful for prioritizing genes for functional validation experiments and complement the analyses of high

  5. PRGPred: A platform for prediction of domains of resistance gene analogue (RGA in Arecaceae developed using machine learning algorithms

    Directory of Open Access Journals (Sweden)

    MATHODIYIL S. MANJULA

    2015-12-01

    Full Text Available Plant disease resistance genes (R-genes are responsible for initiation of defense mechanism against various phytopathogens. The majority of plant R-genes are members of very large multi-gene families, which encode structurally related proteins containing nucleotide binding site domains (NBS and C-terminal leucine rich repeats (LRR. Other classes possess' an extracellular LRR domain, a transmembrane domain and sometimes, an intracellular serine/threonine kinase domain. R-proteins work in pathogen perception and/or the activation of conserved defense signaling networks. In the present study, sequences representing resistance gene analogues (RGAs of coconut, arecanut, oil palm and date palm were collected from NCBI, sorted based on domains and assembled into a database. The sequences were analyzed in PRINTS database to find out the conserved domains and their motifs present in the RGAs. Based on these domains, we have also developed a tool to predict the domains of palm R-genes using various machine learning algorithms. The model files were selected based on the performance of the best classifier in training and testing. All these information is stored and made available in the online ‘PRGpred' database and prediction tool.

  6. Predicting the Pathogenic Potential of BRCA1 and BRCA2 Gene Variants Identified in Clinical Genetic Testing

    Directory of Open Access Journals (Sweden)

    Clare Brookes

    2015-05-01

    Full Text Available Objectives: Missense variants are very commonly detected when screening for mutations in the BRCA1 and BRCA2 genes. Pathogenic mutations in the BRCA1 and BRCA2 genes lead to an increased risk of developing breast, ovarian, prostate and/or pancreatic cancer. This study aimed to assess the predictive capability of in silico programmes and mutation databases in assisting diagnostic laboratories to determine the pathogenicity of sequence-detectable mutations. Methods: Between July 2011 and April 2013, an analysis was undertaken of 13 missense BRCA gene variants that had been detected in patients referred to the Genetic Health Services New Zealand (Northern Hub for BRCA gene analysis. The analysis involved the use of 13 in silico protein prediction programmes, two in silico transcript analysis programmes and the examination of three BRCA gene databases. Results: In most of the variants, the analysis showed different in silico interpretations. This illustrates the interpretation challenges faced by diagnostic laboratories. Conclusion: Unfortunately, when using online mutation databases and carrying out in silico analyses, there is significant discordance in the classification of some missense variants in the BRCA genes. This discordance leads to complexities in interpreting and reporting these variants in a clinical context. The authors have developed a simple procedure for analysing variants; however, those of unknown significance largely remain unknown. As a consequence, the clinical value of some reports may be negligible.

  7. Local gene density predicts the spatial position of genetic loci in the interphase nucleus.

    Science.gov (United States)

    Murmann, Andrea E; Gao, Juntao; Encinosa, Marissa; Gautier, Mathieu; Peter, Marcus E; Eils, Roland; Lichter, Peter; Rowley, Janet D

    2005-11-15

    Specific chromosomal translocations are hallmarks of many human leukemias. The basis for these translocation events is poorly understood, but it has been assumed that spatial positioning of genes in the nucleus of hematopoietic cells is a contributing factor. Analysis of the nuclear 3D position of the gene MLL, frequently involved in chromosomal translocations and five of its translocation partners (AF4, AF6, AF9, ENL and ELL), and two control loci revealed a characteristic radial distribution pattern in all hematopoietic cells studied. Genes in areas of high local gene density were found positioned towards the nuclear center, whereas genes in regions of low gene density were detected closer to the nuclear periphery. The gene density within a 2 Mbp window was found to be a better predictor for the relative positioning of a genomic locus within the cell nucleus than the gene density of entire chromosomes. Analysis of the position of MLL, AF4, AF6 and AF9 in cell lines carrying chromosomal translocations involving these genes revealed that the position of the normal genes was different from that of the fusion genes, and this was again consistent with the changes in local gene density within a 2 Mbp window. Thus, alterations in gene density directly at translocation junctions could explain the change in the position of affected genes in leukemia cells.

  8. Prospective assessment of a gene signature potentially predictive of clinical benefit in metastatic melanoma patients following MAGE-A3 immunotherapeutic (PREDICT)

    Science.gov (United States)

    Saiag, P.; Gutzmer, R.; Ascierto, P. A.; Maio, M.; Grob, J.-J.; Murawa, P.; Dreno, B.; Ross, M.; Weber, J.; Hauschild, A.; Rutkowski, P.; Testori, A.; Levchenko, E.; Enk, A.; Misery, L.; Vanden Abeele, C.; Vojtek, I.; Peeters, O.; Brichard, V. G.; Therasse, P.

    2016-01-01

    Background Genomic profiling of tumor tissue may aid in identifying predictive or prognostic gene signatures (GS) in some cancers. Retrospective gene expression profiling of melanoma and non-small-cell lung cancer led to the characterization of a GS associated with clinical benefit, including improved overall survival (OS), following immunization with the MAGE-A3 immunotherapeutic. The goal of the present study was to prospectively evaluate the predictive value of the previously characterized GS. Patients and methods An open-label prospective phase II trial (‘PREDICT’) in patients with MAGE-A3-positive unresectable stage IIIB-C/IV-M1a melanoma. Results Of 123 subjects who received the MAGE-A3 immunotherapeutic, 71 (58.7%) displayed the predictive GS (GS+). The 1-year OS rate was 83.1%/83.3% in the GS+/GS− populations. The rate of progression-free survival at 12 months was 5.8%/4.1% in GS+/GS− patients. The median time-to-treatment failure was 2.7/2.4 months (GS+/GS−). There was one complete response (GS−) and two partial responses (GS+). The MAGE-A3 immunotherapeutic was similarly immunogenic in both populations and had a clinically acceptable safety profile. Conclusion Treatment of patients with MAGE-A3-positive unresectable stage IIIB-C/IV-M1a melanoma with the MAGE-A3 immunotherapeutic demonstrated an overall 1-year OS rate of 83.5%. GS− and GS+ patients had similar 1-year OS rates, indicating that in this study, GS was not predictive of outcome. Unexpectedly, the objective response rate was lower in this study than in other studies carried out in the same setting with the MAGE-A3 immunotherapeutic. Investigation of a GS to predict clinical benefit to adjuvant MAGE-A3 immunotherapeutic treatment is ongoing in another melanoma study. This study is registered at www.clinicatrials.gov NCT00942162. PMID:27502712

  9. Unraveling toxicological mechanisms and predicting toxicity classes with gene dysregulation networks

    NARCIS (Netherlands)

    Pronk, T.E.; Someren, P. van; Stierum, R.H.; Ezendam, J.; Pennings, J.L.A.

    2013-01-01

    The use of genes for distinguishing classes of toxicity has become well established. In this paper we combine the reconstruction of a gene dysregulation network (GDN) with a classifier to assign unseen compounds to their appropriate class. Gene pairs in the GDN are dysregulated in the sense that the

  10. Unraveling toxicological mechanisms and predicting toxicity classes with gene dysregulation networks

    NARCIS (Netherlands)

    Pronk, T.E.; Someren, P. van; Stierum, R.H.; Ezendam, J.; Pennings, J.L.A.

    2013-01-01

    The use of genes for distinguishing classes of toxicity has become well established. In this paper we combine the reconstruction of a gene dysregulation network (GDN) with a classifier to assign unseen compounds to their appropriate class. Gene pairs in the GDN are dysregulated in the sense that

  11. Prediction of lung cancer based on serum biomarkers by gene expression programming methods.

    Science.gov (United States)

    Yu, Zhuang; Chen, Xiao-Zheng; Cui, Lian-Hua; Si, Hong-Zong; Lu, Hai-Jiao; Liu, Shi-Hai

    2014-01-01

    In diagnosis of lung cancer, rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important. Serum markers, including lactate dehydrogenase (LDH), C-reactive protein (CRP), carcino-embryonic antigen (CEA), neurone specific enolase (NSE) and Cyfra21-1, are reported to reflect lung cancer characteristics. In this study classification of lung tumors was made based on biomarkers (measured in 120 NSCLC and 60 SCLC patients) by setting up optimal biomarker joint models with a powerful computerized tool - gene expression programming (GEP). GEP is a learning algorithm that combines the advantages of genetic programming (GP) and genetic algorithms (GA). It specifically focuses on relationships between variables in sets of data and then builds models to explain these relationships, and has been successfully used in formula finding and function mining. As a basis for defining a GEP environment for SCLC and NSCLC prediction, three explicit predictive models were constructed. CEA and NSE are frequently- used lung cancer markers in clinical trials, CRP, LDH and Cyfra21-1 have significant meaning in lung cancer, basis on CEA and NSE we set up three GEP models-GEP 1(CEA, NSE, Cyfra21-1), GEP2 (CEA, NSE, LDH), GEP3 (CEA, NSE, CRP). The best classification result of GEP gained when CEA, NSE and Cyfra21-1 were combined: 128 of 135 subjects in the training set and 40 of 45 subjects in the test set were classified correctly, the accuracy rate is 94.8% in training set; on collection of samples for testing, the accuracy rate is 88.9%. With GEP2, the accuracy was significantly decreased by 1.5% and 6.6% in training set and test set, in GEP3 was 0.82% and 4.45% respectively. Serum Cyfra21-1 is a useful and sensitive serum biomarker in discriminating between NSCLC and SCLC. GEP modeling is a promising and excellent tool in diagnosis of lung cancer.

  12. Prediction of DNA binding motifs from 3D models of transcription factors; identifying TLX3 regulated genes.

    Science.gov (United States)

    Pujato, Mario; Kieken, Fabien; Skiles, Amanda A; Tapinos, Nikos; Fiser, Andras

    2014-12-16

    Proper cell functioning depends on the precise spatio-temporal expression of its genetic material. Gene expression is controlled to a great extent by sequence-specific transcription factors (TFs). Our current knowledge on where and how TFs bind and associate to regulate gene expression is incomplete. A structure-based computational algorithm (TF2DNA) is developed to identify binding specificities of TFs. The method constructs homology models of TFs bound to DNA and assesses the relative binding affinity for all possible DNA sequences using a knowledge-based potential, after optimization in a molecular mechanics force field. TF2DNA predictions were benchmarked against experimentally determined binding motifs. Success rates range from 45% to 81% and primarily depend on the sequence identity of aligned target sequences and template structures, TF2DNA was used to predict 1321 motifs for 1825 putative human TF proteins, facilitating the reconstruction of most of the human gene regulatory network. As an illustration, the predicted DNA binding site for the poorly characterized T-cell leukemia homeobox 3 (TLX3) TF was confirmed with gel shift assay experiments. TLX3 motif searches in human promoter regions identified a group of genes enriched in functions relating to hematopoiesis, tissue morphology, endocrine system and connective tissue development and function. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Fine mapping and candidate gene prediction of the photoperiod and thermo-sensitive genic male sterile gene pms1(t) in rice

    Institute of Scientific and Technical Information of China (English)

    Yuan-fei ZHOU; Xian-yin ZHANG; Qing-zhong XUE

    2011-01-01

    Pei'ai64S, an indica sterile variety with photoperiod and thermo-sensitive genic male sterile (PTGMS) genes, has been widely exploited for commercial seed production for "two-line" hybrid rice in China. One PTGMS gene from Pei'ai64S, pms1(t), was mapped by a strategy of bulked-extreme and recessive-class approach with simple sequence repeat (SSR) and insert and deletion (In-Del) markers. Using linkage analysis for the F2 mapping population consisting of 320 completely male sterile individuals derived from a cross between Pei'ai64S and 93-11 (indica restorer) lines, the pms1(t) gene was delimited to the region between the RM21242 (0.2 cM) and YF11 (0.2 cM) markers on the short arm of chromosome 7. The interval containing the pms1(t) locus, which was co-segregated with RM6776, is a 101.1 kb region based on the Nipponbare rice genome. Fourteen predicted loci were found in this region by the Institute for Genomic Research (TIGR) Genomic Annotation. Based on the function of the locus LOC_Os07g12130 by bioinformatics analysis, it is predicted to encode a protein containing a Myb-like DNA-binding domain, and may process the transcript with thermosensory response. The reverse transcription-polymerase chain reaction (RT-PCR) results revealed that the mRNA levels of LOC_Os07g12130 were altered in different photoperiod and temperature treatments. Thus, the LOC_Os07g12130 locus is the most likely candidate gene for pms1(t). These results may facilitate not only using the molecular marker assisted selection of PTGMS genes, but also cloning of the pms1(t) gene itself.

  14. Multiclass prediction with partial least square regression for gene expression data: applications in breast cancer intrinsic taxonomy.

    Science.gov (United States)

    Huang, Chi-Cheng; Tu, Shih-Hsin; Huang, Ching-Shui; Lien, Heng-Hui; Lai, Liang-Chuan; Chuang, Eric Y

    2013-01-01

    Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS) regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n = 535). The agreement between PAM50 centroid-based single sample prediction (SSP) and PLS-regression was excellent (weighted Kappa: 0.988) within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed). Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.

  15. A regulatory network modeled from wild-type gene expression data guides functional predictions in Caenorhabditis elegans development

    Directory of Open Access Journals (Sweden)

    Stigler Brandilyn

    2012-06-01

    Full Text Available Abstract Background Complex gene regulatory networks underlie many cellular and developmental processes. While a variety of experimental approaches can be used to discover how genes interact, few biological systems have been systematically evaluated to the extent required for an experimental definition of the underlying network. Therefore, the development of computational methods that can use limited experimental data to define and model a gene regulatory network would provide a useful tool to evaluate many important but incompletely understood biological processes. Such methods can assist in extracting all relevant information from data that are available, identify unexpected regulatory relationships and prioritize future experiments. Results To facilitate the analysis of gene regulatory networks, we have developed a computational modeling pipeline method that complements traditional evaluation of experimental data. For a proof-of-concept example, we have focused on the gene regulatory network in the nematode C. elegans that mediates the developmental choice between mesodermal (muscle and ectodermal (skin cell fates in the embryonic C lineage. We have used gene expression data to build two models: a knowledge-driven model based on gene expression changes following gene perturbation experiments, and a data-driven mathematical model derived from time-course gene expression data recovered from wild-type animals. We show that both models can identify a rich set of network gene interactions. Importantly, the mathematical model built only from wild-type data can predict interactions demonstrated by the perturbation experiments better than chance, and better than an existing knowledge-driven model built from the same data set. The mathematical model also provides new biological insight, including a dissection of zygotic from maternal functions of a key transcriptional regulator, PAL-1, and identification of non-redundant activities of the T-box genes

  16. A regulatory network modeled from wild-type gene expression data guides functional predictions in Caenorhabditis elegans development.

    Science.gov (United States)

    Stigler, Brandilyn; Chamberlin, Helen M

    2012-06-26

    Complex gene regulatory networks underlie many cellular and developmental processes. While a variety of experimental approaches can be used to discover how genes interact, few biological systems have been systematically evaluated to the extent required for an experimental definition of the underlying network. Therefore, the development of computational methods that can use limited experimental data to define and model a gene regulatory network would provide a useful tool to evaluate many important but incompletely understood biological processes. Such methods can assist in extracting all relevant information from data that are available, identify unexpected regulatory relationships and prioritize future experiments. To facilitate the analysis of gene regulatory networks, we have developed a computational modeling pipeline method that complements traditional evaluation of experimental data. For a proof-of-concept example, we have focused on the gene regulatory network in the nematode C. elegans that mediates the developmental choice between mesodermal (muscle) and ectodermal (skin) cell fates in the embryonic C lineage. We have used gene expression data to build two models: a knowledge-driven model based on gene expression changes following gene perturbation experiments, and a data-driven mathematical model derived from time-course gene expression data recovered from wild-type animals. We show that both models can identify a rich set of network gene interactions. Importantly, the mathematical model built only from wild-type data can predict interactions demonstrated by the perturbation experiments better than chance, and better than an existing knowledge-driven model built from the same data set. The mathematical model also provides new biological insight, including a dissection of zygotic from maternal functions of a key transcriptional regulator, PAL-1, and identification of non-redundant activities of the T-box genes tbx-8 and tbx-9. This work provides a strong

  17. Hierarchy of gene expression data is predictive of future breast cancer outcome

    Science.gov (United States)

    Chen, Man; Deem, Michael W.

    2013-10-01

    We calculate measures of hierarchy in gene and tissue networks of breast cancer patients. We find that the likelihood of metastasis in the future is correlated with increased values of network hierarchy for expression networks of cancer-associated genes, due to the correlated expression of cancer-specific pathways. Conversely, future metastasis and quick relapse times are negatively correlated with the values of network hierarchy in the expression network of all genes, due to the dedifferentiation of gene pathways and circuits. These results suggest that the hierarchy of gene expression may be useful as an additional biomarker for breast cancer prognosis.

  18. Seven-CpG-based prognostic signature coupled with gene expression predicts survival of oral squamous cell carcinoma.

    Science.gov (United States)

    Shen, Sipeng; Wang, Guanrong; Shi, Qianwen; Zhang, Ruyang; Zhao, Yang; Wei, Yongyue; Chen, Feng; Christiani, David C

    2017-01-01

    DNA methylation has started a recent revolution in genomics biology by identifying key biomarkers for multiple cancers, including oral squamous cell carcinoma (OSCC), the most common head and neck squamous cell carcinoma. A multi-stage screening strategy was used to identify DNA-methylation-based signatures for OSCC prognosis. We used The Cancer Genome Atlas (TCGA) data as training set which were validated in two independent datasets from Gene Expression Omnibus (GEO). The correlation between DNA methylation and corresponding gene expression and the prognostic value of the gene expression were explored as well. The seven DNA methylation CpG sites were identified which were significantly associated with OSCC overall survival. Prognostic signature, a weighted linear combination of the seven CpG sites, successfully distinguished the overall survival of OSCC patients and had a moderate predictive ability for survival [training set: hazard ratio (HR) = 3.23, P = 5.52 × 10(-10), area under the curve (AUC) = 0.76; validation set 1: HR = 2.79, P = 0.010, AUC = 0.67; validation set 2: HR = 3.69, P = 0.011, AUC = 0.66]. Stratification analysis by human papillomavirus status, clinical stage, age, gender, smoking status, and grade retained statistical significance. Expression of genes corresponding to candidate CpG sites (AJAP1, SHANK2, FOXA2, MT1A, ZNF570, HOXC4, and HOXB4) was also significantly associated with patient's survival. Signature integrating of DNA methylation, gene expression, and clinical information showed a superior ability for prognostic prediction (AUC = 0.78). Prognostic signature integrated of DNA methylation, gene expression, and clinical information provides a better prognostic prediction value for OSCC patients than that with clinical information only.

  19. Arabidopsis CPR5 is a senescence-regulatory gene with pleiotropic functions as predicted by the evolutionary theory of senescence.

    Science.gov (United States)

    Jing, Hai-Chun; Anderson, Lisa; Sturre, Marcel J G; Hille, Jacques; Dijkwel, Paul P

    2007-01-01

    Evolutionary theories of senescence predict that genes with pleiotropic functions are important for senescence regulation. In plants there is no direct molecular genetic test for the existence of such senescence-regulatory genes. Arabidopsis cpr5 mutants exhibit multiple phenotypes including hypersensitivity to various signalling molecules, constitutive expression of pathogen-related genes, abnormal trichome development, spontaneous lesion formation, and accelerated leaf senescence. These indicate that CPR5 is a beneficial gene which controls multiple facets of the Arabidopsis life cycle. Ectopic expression of CPR5 restored all the mutant phenotypes. However, in transgenic plants with increased CPR5 transcripts, accelerated leaf senescence was observed in detached leaves and at late development around 50 d after germination, as illustrated by the earlier onset of senescence-associated physiological and molecular markers. Thus, CPR5 has early-life beneficial effects by repressing cell death and insuring normal plant development, but late-life deleterious effects by promoting developmental senescence. As such, CPR5 appears to function as a typical senescence-regulatory gene as predicted by the evolutionary theories of senescence.

  20. Gene-gene interactions between HNF4A and KCNJ11 in predicting Type 2 diabetes in women

    NARCIS (Netherlands)

    Qi, L.; van Dam, R. M.; Asselbergs, F. W.; Hu, F. B.

    2007-01-01

    Aims Recent studies indicate transcription factor hepatocyte nuclear factor 4 alpha ( HNF-4 alpha, HNF4A) modulates the transcription of the pancreatic B-cell ATP-sensitive K+ ( K-ATP) channel subunit Kir6.2 gene ( KCNJ11). Both HNF4A and KCNJ11 have previously been associated with diabetes risk but

  1. Aberrant gene methylation in the peritoneal fluid is a risk factor predicting peritoneal recurrence in gastric cancer

    Institute of Scientific and Technical Information of China (English)

    Masatsugu; Hiraki; Yoshihiko; Kitajima; Seiji; Sato; Jun; Nakamura; Kazuyoshi; Hashiguchi; Hirokazu; Noshiro; Kohji; Miyazaki

    2010-01-01

    AIM:To investigate whether gene methylation in the peritoneal fluid (PF) predicts peritoneal recurrence in gastric cancer patients.METHODS: The gene methylation of CHFR (checkpoint with forkhead and ring finger domains), p16, RUNX3 (runt-related transcription factor 3), E-cadherin, hMLH1 (mutL homolog 1), ABCG2 (ATP-binding cassette, sub-family G, member 2) and BNIP3 (BCL2/adenovirus E1B 19 kDa interacting protein 3) were analyzed in 80 specimens of PF by quantitative methylation-specific polymerase chain r...

  2. Next-generation sequencing of the porcine skeletal muscle transcriptome for computational prediction of microRNA gene targets.

    Directory of Open Access Journals (Sweden)

    Tara G McDaneld

    Full Text Available BACKGROUND: MicroRNA are a class of small RNAs that regulate gene expression by inhibiting translation of protein encoding transcripts through targeting of a microRNA-protein complex by base-pairing of the microRNA sequence to cognate recognition sequences in the 3' untranslated region (UTR of the mRNA. Target identification for a given microRNA sequence is generally accomplished by informatics analysis of predicted mRNA sequences present in the genome or in databases of transcript sequence for the tissue of interest. However, gene models for porcine skeletal muscle transcripts in current databases, specifically complete sequence of the 3' UTR, are inadequate for this exercise. METHODOLOGY/PRINCIPAL FINDINGS: To provide data necessary to identify gene targets for microRNA in porcine skeletal muscle, normalized cDNA libraries were sequenced using Roche 454 GS-FLX pyrosequencing and de novo assembly of transcripts enriched in the 3' UTR was performed using the MIRA sequence assembly program. Over 725 million bases of sequence were generated, which assembled into 18,202 contigs. Sequence reads were mapped to a 3' UTR database containing porcine sequences. The 3' UTR that mapped to the database were examined to predict targets for previously identified microRNA that had been separately sequenced from the same porcine muscle sample used to generate the cDNA libraries. For genes with microRNA-targeted 3' UTR, KEGG pathways were computationally determined in order to identify potential functional effects of these microRNA-targeted transcripts. CONCLUSIONS: Through next-generation sequencing of transcripts expressed in skeletal muscle, mapping reads to a 3' UTR database, and prediction of microRNA target sites in the 3' UTR, our results identified genes expressed in porcine skeletal muscle and predicted the microRNA that target these genes. Additionally, identification of pathways regulated by these microRNA-targeted genes provides us with a set of

  3. Prioritizing predicted cis-regulatory elements for co-expressed gene sets based on Lasso regression models.

    Science.gov (United States)

    Hu, Hong; Roqueiro, Damian; Dai, Yang

    2011-01-01

    Computational prediction of cis-regulatory elements for a set of co-expressed genes based on sequence analysis provides an overwhelming volume of potential transcription factor binding sites. It presents a challenge to prioritize transcription factors for regulatory functional studies. A novel approach based on the use of Lasso regression models is proposed to address this problem. We examine the ability of the Lasso model using time-course microarray data obtained from a comprehensive study of gene expression profiles in skin and mucosal wounds in mouse over all stages of wound healing.

  4. Prediction

    CERN Document Server

    Sornette, Didier

    2010-01-01

    This chapter first presents a rather personal view of some different aspects of predictability, going in crescendo from simple linear systems to high-dimensional nonlinear systems with stochastic forcing, which exhibit emergent properties such as phase transitions and regime shifts. Then, a detailed correspondence between the phenomenology of earthquakes, financial crashes and epileptic seizures is offered. The presented statistical evidence provides the substance of a general phase diagram for understanding the many facets of the spatio-temporal organization of these systems. A key insight is to organize the evidence and mechanisms in terms of two summarizing measures: (i) amplitude of disorder or heterogeneity in the system and (ii) level of coupling or interaction strength among the system's components. On the basis of the recently identified remarkable correspondence between earthquakes and seizures, we present detailed information on a class of stochastic point processes that has been found to be particu...

  5. A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury

    Science.gov (United States)

    Kohonen, Pekka; Parkkinen, Juuso A.; Willighagen, Egon L.; Ceder, Rebecca; Wennerberg, Krister; Kaski, Samuel; Grafström, Roland C.

    2017-01-01

    Predicting unanticipated harmful effects of chemicals and drug molecules is a difficult and costly task. Here we utilize a ‘big data compacting and data fusion’—concept to capture diverse adverse outcomes on cellular and organismal levels. The approach generates from transcriptomics data set a ‘predictive toxicogenomics space’ (PTGS) tool composed of 1,331 genes distributed over 14 overlapping cytotoxicity-related gene space components. Involving ∼2.5 × 108 data points and 1,300 compounds to construct and validate the PTGS, the tool serves to: explain dose-dependent cytotoxicity effects, provide a virtual cytotoxicity probability estimate intrinsic to omics data, predict chemically-induced pathological states in liver resulting from repeated dosing of rats, and furthermore, predict human drug-induced liver injury (DILI) from hepatocyte experiments. Analysing 68 DILI-annotated drugs, the PTGS tool outperforms and complements existing tests, leading to a hereto-unseen level of DILI prediction accuracy. PMID:28671182

  6. Prediction of disease-gene-drug relationships following a differential network analysis.

    Science.gov (United States)

    Zickenrott, S; Angarica, V E; Upadhyaya, B B; del Sol, A

    2016-01-01

    Great efforts are being devoted to get a deeper understanding of disease-related dysregulations, which is central for introducing novel and more effective therapeutics in the clinics. However, most human diseases are highly multifactorial at the molecular level, involving dysregulation of multiple genes and interactions in gene regulatory networks. This issue hinders the elucidation of disease mechanism, including the identification of disease-causing genes and regulatory interactions. Most of current network-based approaches for the study of disease mechanisms do not take into account significant differences in gene regulatory network topology between healthy and disease phenotypes. Moreover, these approaches are not able to efficiently guide database search for connections between drugs, genes and diseases. We propose a differential network-based methodology for identifying candidate target genes and chemical compounds for reverting disease phenotypes. Our method relies on transcriptomics data to reconstruct gene regulatory networks corresponding to healthy and disease states separately. Further, it identifies candidate genes essential for triggering the reversion of the disease phenotype based on network stability determinants underlying differential gene expression. In addition, our method selects and ranks chemical compounds targeting these genes, which could be used as therapeutic interventions for complex diseases.

  7. IGF-I induced genes in stromal fibroblasts predict the clinical outcome of breast and lung cancer patients

    Directory of Open Access Journals (Sweden)

    Herrmann Richard

    2010-01-01

    Full Text Available Abstract Background Insulin-like growth factor-1 (IGF-I signalling is important for cancer initiation and progression. Given the emerging evidence for the role of the stroma in these processes, we aimed to characterize the effects of IGF-I on cancer cells and stromal cells separately. Methods We used an ex vivo culture model and measured gene expression changes after IGF-I stimulation with cDNA microarrays. In vitro data were correlated with in vivo findings by comparing the results with published expression datasets on human cancer biopsies. Results Upon stimulation with IGF-I, breast cancer cells and stromal fibroblasts show some common and other distinct response patterns. Among the up-regulated genes in the stromal fibroblasts we observed a significant enrichment in proliferation associated genes. The expression of the IGF-I induced genes was coherent and it provided a basis for the segregation of the patients into two groups. Patients with tumours with highly expressed IGF-I induced genes had a significantly lower survival rate than patients whose tumours showed lower levels of IGF-I induced gene expression (P = 0.029 - Norway/Stanford and P = 7.96e-09 - NKI dataset. Furthermore, based on an IGF-I induced gene expression signature derived from primary lung fibroblasts, a separation of prognostically different lung cancers was possible (P = 0.007 - Bhattacharjee and P = 0.008 - Garber dataset. Conclusion Expression patterns of genes induced by IGF-I in primary breast and lung fibroblasts accurately predict outcomes in breast and lung cancer patients. Furthermore, these IGF-I induced gene signatures derived from stromal fibroblasts might be promising predictors for the response to IGF-I targeted therapies. See the related commentary by Werner and Bruchim: http://www.biomedcentral.com/1741-7015/8/2

  8. Applied the additive hazard model to predict the survival time of patient with diffuse large B- cell lymphoma and determine the effective genes, using microarray data

    Directory of Open Access Journals (Sweden)

    Arefa Jafarzadeh Kohneloo

    2015-09-01

    Full Text Available Background: Recent studies have shown that effective genes on survival time of cancer patients play an important role as a risk factor or preventive factor. Present study was designed to determine effective genes on survival time for diffuse large B-cell lymphoma patients and predict the survival time using these selected genes. Materials & Methods: Present study is a cohort study was conducted on 40 patients with diffuse large B-cell lymphoma. For these patients, 2042 gene expression was measured. In order to predict the survival time, the composition of the semi-parametric additive survival model with two gene selection methods elastic net and lasso were used. Two methods were evaluated by plotting area under the ROC curve over time and calculating the integral of this curve. Results: Based on our findings, the elastic net method identified 10 genes, and Lasso-Cox method identified 7 genes. GENE3325X increased the survival time (P=0.006, Whereas GENE3980X and GENE377X reduced the survival time (P=0.004. These three genes were selected as important genes in both methods. Conclusion: This study showed that the elastic net method outperformed the common Lasso method in terms of predictive power. Moreover, apply the additive model instead Cox regression and using microarray data is usable way for predict the survival time of patients.

  9. Hierarchy in gene expression is predictive of risk, progression, and outcome in adult acute myeloid leukemia

    Science.gov (United States)

    Tripathi, Shubham; Deem, Michael W.

    2015-02-01

    Cancer progresses with a change in the structure of the gene network in normal cells. We define a measure of organizational hierarchy in gene networks of affected cells in adult acute myeloid leukemia (AML) patients. With a retrospective cohort analysis based on the gene expression profiles of 116 AML patients, we find that the likelihood of future cancer relapse and the level of clinical risk are directly correlated with the level of organization in the cancer related gene network. We also explore the variation of the level of organization in the gene network with cancer progression. We find that this variation is non-monotonic, which implies the fitness landscape in the evolution of AML cancer cells is non-trivial. We further find that the hierarchy in gene expression at the time of diagnosis may be a useful biomarker in AML prognosis.

  10. Post genome-wide association studies of novel genes associated with type 2 diabetes show gene-gene interaction and high predictive value.

    Directory of Open Access Journals (Sweden)

    Stéphane Cauchi

    Full Text Available BACKGROUND: Recently, several Genome Wide Association (GWA studies in populations of European descent have identified and validated novel single nucleotide polymorphisms (SNPs, highly associated with type 2 diabetes (T2D. Our aims were to validate these markers in other European and non-European populations, then to assess their combined effect in a large French study comparing T2D and normal glucose tolerant (NGT individuals. METHODOLOGY/PRINCIPAL FINDINGS: In the same French population analyzed in our previous GWA study (3,295 T2D and 3,595 NGT, strong associations with T2D were found for CDKAL1 (OR(rs7756992 = 1.30[1.19-1.42], P = 2.3x10(-9, CDKN2A/2B (OR(rs10811661 = 0.74[0.66-0.82], P = 3.5x10(-8 and more modestly for IGFBP2 (OR(rs1470579 = 1.17[1.07-1.27], P = 0.0003 SNPs. These results were replicated in both Israeli Ashkenazi (577 T2D and 552 NGT and Austrian (504 T2D and 753 NGT populations (except for CDKAL1 but not in the Moroccan population (521 T2D and 423 NGT. In the overall group of French subjects (4,232 T2D and 4,595 NGT, IGFBP2 and CXCR4 synergistically interacted with (LOC38776, SLC30A8, HHEX and (NGN3, CDKN2A/2B, respectively, encoding for proteins presumably regulating pancreatic endocrine cell development and function. The T2D risk increased strongly when risk alleles, including the previously discovered T2D-associated TCF7L2 rs7903146 SNP, were combined (8.68-fold for the 14% of French individuals carrying 18 to 30 risk alleles with an allelic OR of 1.24. With an area under the ROC curve of 0.86, only 15 novel loci were necessary to discriminate French individuals susceptible to develop T2D. CONCLUSIONS/SIGNIFICANCE: In addition to TCF7L2, SLC30A8 and HHEX, initially identified by the French GWA scan, CDKAL1, IGFBP2 and CDKN2A/2B strongly associate with T2D in French individuals, and mostly in populations of Central European descent but not in Moroccan subjects. Genes expressed in the pancreas interact together and their

  11. Gene Expression Signature TOPFOX Reflecting Chromosomal Instability Refines Prediction of Prognosis in Grade 2 Breast Cancer

    DEFF Research Database (Denmark)

    Szasz, A.; Li, Qiyuan; Sztupinszki, Z.

    2011-01-01

    were diagnosed between 1999–2002 at the Budai MA´ V Hospital. 187 formalinfixed, paraffin-embedded breast cancer samples were included in the qPCR-based measurement of expression of AURKA, FOXM1, TOP2A and TPX2 genes. The expression of the genes were correlated to recurrencefree survival (RFS......Purpose: To assess the ability of genes selected from those reflecting chromosomal instability to identify good and poor prognostic subsets of Grade 2 breast carcinomas. Methods: We selected genes for splitting grade 2 tumours into low and high grade type groups by using public databases. Patients...

  12. Efficient CRISPR/Cas9-Mediated Versatile, Predictable, and Donor-Free Gene Knockout in Human Pluripotent Stem Cells

    Directory of Open Access Journals (Sweden)

    Zhongliang Liu

    2016-09-01

    Full Text Available Loss-of-function studies in human pluripotent stem cells (hPSCs require efficient methodologies for lesion of genes of interest. Here, we introduce a donor-free paired gRNA-guided CRISPR/Cas9 knockout strategy (paired-KO for efficient and rapid gene ablation in hPSCs. Through paired-KO, we succeeded in targeting all genes of interest with high biallelic targeting efficiencies. More importantly, during paired-KO, the cleaved DNA was repaired mostly through direct end joining without insertions/deletions (precise ligation, and thus makes the lesion product predictable. The paired-KO remained highly efficient for one-step targeting of multiple genes and was also efficient for targeting of microRNA, while for long non-coding RNA over 8 kb, cleavage of a short fragment of the core promoter region was sufficient to eradicate downstream gene transcription. This work suggests that the paired-KO strategy is a simple and robust system for loss-of-function studies for both coding and non-coding genes in hPSCs.

  13. Prognostic and predictive value of VHL gene alteration in renal cell carcinoma: a meta-analysis and review.

    Science.gov (United States)

    Kim, Bum Jun; Kim, Jung Han; Kim, Hyeong Su; Zang, Dae Young

    2017-01-17

    The von Hippel-Lindau (VHL) gene is often inactivated in sporadic renal cell carcinoma (RCC) by mutation or promoter hypermethylation. The prognostic or predictive value of VHL gene alteration is not well established. We conducted this meta-analysis to evaluate the association between the VHL alteration and clinical outcomes in patients with RCC. We searched PUBMED, MEDLINE and EMBASE for articles including following terms in their titles, abstracts, or keywords: 'kidney or renal', 'carcinoma or cancer or neoplasm or malignancy', 'von Hippel-Lindau or VHL', 'alteration or mutation or methylation', and 'prognostic or predictive'. There were six studies fulfilling inclusion criteria and a total of 633 patients with clear cell RCC were included in the study: 244 patients who received anti-vascular endothelial growth factor (VEGF) therapy in the predictive value analysis and 419 in the prognostic value analysis. Out of 663 patients, 410 (61.8%) had VHL alteration. The meta-analysis showed no association between the VHL gene alteration and overall response rate (relative risk = 1.47 [95% CI, 0.81-2.67], P = 0.20) or progression free survival (hazard ratio = 1.02 [95% CI, 0.72-1.44], P = 0.91) in patients with RCC who received VEGF-targeted therapy. There was also no correlation between the VHL alteration and overall survival (HR = 0.80 [95% CI, 0.56-1.14], P = 0.21). In conclusion, this meta-analysis indicates that VHL gene alteration has no prognostic or predictive value in patients with clear cell RCC.

  14. Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering

    Directory of Open Access Journals (Sweden)

    Sharma Animesh

    2007-01-01

    Full Text Available Abstract Background The four heterogeneous childhood cancers, neuroblastoma, non-Hodgkin lymphoma, rhabdomyosarcoma, and Ewing sarcoma present a similar histology of small round blue cell tumor (SRBCT and thus often leads to misdiagnosis. Identification of biomarkers for distinguishing these cancers is a well studied problem. Existing methods typically evaluate each gene separately and do not take into account the nonlinear interaction between genes and the tools that are used to design the diagnostic prediction system. Consequently, more genes are usually identified as necessary for prediction. We propose a general scheme for finding a small set of biomarkers to design a diagnostic system for accurate classification of the cancer subgroups. We use multilayer networks with online gene selection ability and relational fuzzy clustering to identify a small set of biomarkers for accurate classification of the training and blind test cases of a well studied data set. Results Our method discerned just seven biomarkers that precisely categorized the four subgroups of cancer both in training and blind samples. For the same problem, others suggested 19–94 genes. These seven biomarkers include three novel genes (NAB2, LSP1 and EHD1 – not identified by others with distinct class-specific signatures and important role in cancer biology, including cellular proliferation, transendothelial migration and trafficking of MHC class antigens. Interestingly, NAB2 is downregulated in other tumors including Non-Hodgkin lymphoma and Neuroblastoma but we observed moderate to high upregulation in a few cases of Ewing sarcoma and Rabhdomyosarcoma, suggesting that NAB2 might be mutated in these tumors. These genes can discover the subgroups correctly with unsupervised learning, can differentiate non-SRBCT samples and they perform equally well with other machine learning tools including support vector machines. These biomarkers lead to four simple human interpretable

  15. pathDIP: an annotated resource for known and predicted human gene-pathway associations and pathway enrichment analysis

    Science.gov (United States)

    Rahmati, Sara; Abovsky, Mark; Pastrello, Chiara; Jurisica, Igor

    2017-01-01

    Molecular pathway data are essential in current computational and systems biology research. While there are many primary and integrated pathway databases, several challenges remain, including low proteome coverage (57%), low overlap across different databases, unavailability of direct information about underlying physical connectivity of pathway members, and high fraction of protein-coding genes without any pathway annotations, i.e. ‘pathway orphans’. In order to address all these challenges, we developed pathDIP, which integrates data from 20 source pathway databases, ‘core pathways’, with physical protein–protein interactions to predict biologically relevant protein–pathway associations, referred to as ‘extended pathways’. Cross-validation determined 71% recovery rate of our predictions. Data integration and predictions increase coverage of pathway annotations for protein-coding genes to 86%, and provide novel annotations for 5732 pathway orphans. PathDIP (http://ophid.utoronto.ca/pathdip) annotates 17 070 protein-coding genes with 4678 pathways, and provides multiple query, analysis and output options. PMID:27899558

  16. The FurA regulon in Anabaena sp. PCC 7120: in silico prediction and experimental validation of novel target genes.

    Science.gov (United States)

    González, Andrés; Angarica, Vladimir Espinosa; Sancho, Javier; Fillat, María F

    2014-04-01

    In the filamentous cyanobacterium Anabaena sp. PCC 7120, the ferric uptake regulator FurA functions as a global transcriptional regulator. Despite several analyses have focused on elucidating the FurA-regulatory network, the number of target genes described for this essential transcription factor is limited to a handful of examples. In this article, we combine an in silico genome-wide predictive approach with experimental determinations to better define the FurA regulon. Predicted FurA-binding sites were identified upstream of 215 genes belonging to diverse functional categories including iron homeostasis, photosynthesis and respiration, heterocyst differentiation, oxidative stress defence and light-dependent signal transduction mechanisms, among others. The probabilistic model proved to be effective at discerning FurA boxes from non-cognate sequences, while subsequent electrophoretic mobility shift assay experiments confirmed the in vitro specific binding of FurA to at least 20 selected predicted targets. Gene-expression analyses further supported the dual role of FurA as transcriptional modulator that can act both as repressor and as activator. In either role, the in vitro affinity of the protein to its target sequences is strongly dependent on metal co-regulator and reducing conditions, suggesting that FurA couples in vivo iron homeostasis and the response to oxidative stress to major physiological processes in cyanobacteria.

  17. Prediction of Metastasis and Recurrence in Colorectal Cancer Based on Gene Expression Analysis: Ready for the Clinic?

    Energy Technology Data Exchange (ETDEWEB)

    Shibayama, Masaki [Sysmex Corporation, Central Research Laboratories, Kobe 651-2271 (Japan); Maak, Matthias; Nitsche, Ulrich [Chirurgische Klinik, Klinikum Rechts der Isar der TUM, München 81657 (Germany); Gotoh, Kengo [Sysmex Corporation, Central Research Laboratories, Kobe 651-2271 (Japan); Rosenberg, Robert; Janssen, Klaus-Peter, E-mail: klaus-peter.janssen@lrz.tum.de [Chirurgische Klinik, Klinikum Rechts der Isar der TUM, München 81657 (Germany)

    2011-07-07

    Cancers of the colon and rectum, which rank among the most frequent human tumors, are currently treated by surgical resection in locally restricted tumor stages. However, disease recurrence and formation of local and distant metastasis frequently occur even in cases with successful curative resection of the primary tumor (R0). Recent technological advances in molecular diagnostic analysis have led to a wealth of knowledge about the changes in gene transcription in all stages of colorectal tumors. Differential gene expression, or transcriptome analysis, has been proposed by many groups to predict disease recurrence, clinical outcome, and also response to therapy, in addition to the well-established clinico-pathological factors. However, the clinical usability of gene expression profiling as a reliable and robust prognostic tool that allows evidence-based clinical decisions is currently under debate. In this review, we will discuss the most recent data on the prognostic significance and potential clinical application of genome wide expression analysis in colorectal cancer.

  18. Molecular Clone, Expression, and Prediction of Construction and Function to Key Genes of Interleukin Family of Porcine

    Institute of Scientific and Technical Information of China (English)

    JING Zhi-zhong; DOU Yong-xi; LUO Qi-hui; CHEN Guo-hua; MENG Xue-lian; ZHENG Ya-dong; LUO Xue-nong; CAI Xue-peng

    2007-01-01

    This research was to clone, express, and analyze the structure and function of major molecules of porcine interleukin family. Genes of porcine interleukin family were cloned by RT-PCR from stimulated porcine PBMC by LPS and PHA, and then expressed in E. coli, and the structure and function of these molecules were predicted by ExPASY. The results showed that genes of IL-4, IL-6, and IL-18 were successfully cloned and expressed. Furthermore, the expression products of recombinant IL-4 and IL-6 both have multiple biological activities. By analyzing these genes with the NCBI/GenBank data, the homologies of the nucleotide acid sequence are 99.25, 99.21, and 100%, respectively, and have great species differences when compared with other animal species. The results of the prediction showed that all these molecules contain several phosphorylation, glycosylation, protein kinase, and signal transduction bonding sites in secondary structure, and all are compact globularity protein in space configuration. These characteristics of structure are the basis for their multiple biological functions. The genes, structure and function of key molecular of porcine interleukin family were successfully cloned, expressed, and analyzed in this paper.

  19. SVMRFE based approach for prediction of most discriminatory gene target for type II diabetes

    Directory of Open Access Journals (Sweden)

    Atul Kumar

    2017-06-01

    Full Text Available Type II diabetes is a chronic condition that affects the way our body metabolizes sugar. The body's important source of fuel is now becoming a chronic disease all over the world. It is now very necessary to identify the new potential targets for the drugs which not only control the disease but also can treat it. Support vector machines are the classifier which has a potential to make a classification of the discriminatory genes and non-discriminatory genes. SVMRFE a modification of SVM ranks the genes based on their discriminatory power and eliminate the genes which are not involved in causing the disease. A gene regulatory network has been formed with the top ranked coding genes to identify their role in causing diabetes. To further validate the results pathway study was performed to identify the involvement of the coding genes in type II diabetes. The genes obtained from this study showed a significant involvement in causing the disease, which may be used as a potential drug target.

  20. Simpler Evaluation of Predictions and Signature Stability for Gene Expression Data

    Directory of Open Access Journals (Sweden)

    Yvonne E. Pittelkow

    2009-01-01

    Full Text Available Scientific advances are raising expectations that patient-tailored treatment will soon be available. The development of resulting clinical approaches needs to be based on well-designed experimental and observational procedures that provide data to which proper biostatistical analyses are applied. Gene expression microarray and related technology are rapidly evolving. It is providing extremely large gene expression profiles containing many thousands of measurements. Choosing a subset from these gene expression measurements to include in a gene expression signature is one of the many challenges needing to be met. Choice of this signature depends on many factors, including the selection of patients in the training set. So the reliability and reproducibility of the resultant prognostic gene signature needs to be evaluated, in such a way as to be relevant to the clinical setting. A relatively straightforward approach is based on cross validation, with separate selection of genes at each iteration to avoid selection bias. Within this approach we developed two different methods, one based on forward selection, the other on genes that were statistically significant in all training blocks of data. We demonstrate our approach to gene signature evaluation with a well-known breast cancer data set.

  1. Gene Expression Signature TOPFOX Reflecting Chromosomal Instability Refines Prediction of Prognosis in Grade 2 Breast Cancer

    DEFF Research Database (Denmark)

    Szasz, A.; Li, Qiyuan; Sztupinszki, Z.

    2011-01-01

    were diagnosed between 1999–2002 at the Budai MA´ V Hospital. 187 formalinfixed, paraffin-embedded breast cancer samples were included in the qPCR-based measurement of expression of AURKA, FOXM1, TOP2A and TPX2 genes. The expression of the genes were correlated to recurrencefree survival (RFS...

  2. Can gene expression profiling predict survival for patients with squamous cell carcinoma of the lung?

    Directory of Open Access Journals (Sweden)

    Endo Chiaki

    2004-12-01

    Full Text Available Abstract Background Lung cancer remains to be the leading cause of cancer death worldwide. Patients with similar lung cancer may experience quite different clinical outcomes. Reliable molecular prognostic markers are needed to characterize the disparity. In order to identify the genes responsible for the aggressiveness of squamous cell carcinoma of the lung, we applied DNA microarray technology to a case control study. Fifteen patients with surgically treated stage I squamous cell lung cancer were selected. Ten were one-to-one matched on tumour size and grade, age, gender, and smoking status; five died of lung cancer recurrence within 24 months (high-aggressive group, and five survived more than 54 months after surgery (low-aggressive group. Five additional tissues were included as test samples. Unsupervised and supervised approaches were used to explore the relationship among samples and identify differentially expressed genes. We also evaluated the gene markers' accuracy in segregating samples to their respective group. Functional gene networks for the significant genes were retrieved, and their association with survival was tested. Results Unsupervised clustering did not group tumours based on survival experience. At p Conclusion The overall gene expression pattern between the high and low aggressive squamous cell carcinomas of the lung did not differ significantly with the control of confounding factors. A small subset of genes or genes in specific pathways may be responsible for the aggressive nature of a tumour and could potentially serve as panels of prognostic markers for stage I squamous cell lung cancer.

  3. Classification and Diagnostic Output Prediction of Cancer Using Gene Expression Profiling and Supervised Machine Learning Algorithms

    DEFF Research Database (Denmark)

    Yoo, C.; Gernaey, Krist

    2008-01-01

    In this paper, a new supervised clustering and classification method is proposed. First, the application of discriminant partial least squares (DPLS) for the selection of a minimum number of key genes is applied on a gene expression microarray data set. Second, supervised hierarchical clustering ...

  4. Genetic organisation, mobility and predicted functions of genes on integrated, mobile genetic elements in sequenced strains of Clostridium difficile.

    Directory of Open Access Journals (Sweden)

    Michael S M Brouwer

    Full Text Available BACKGROUND: Clostridium difficile is the leading cause of hospital-associated diarrhoea in the US and Europe. Recently the incidence of C. difficile-associated disease has risen dramatically and concomitantly with the emergence of 'hypervirulent' strains associated with more severe disease and increased mortality. C. difficile contains numerous mobile genetic elements, resulting in the potential for a highly plastic genome. In the first sequenced strain, 630, there is one proven conjugative transposon (CTn, Tn5397, and six putative CTns (CTn1, CTn2 and CTn4-7, of which, CTn4 and CTn5 were capable of excision. In the second sequenced strain, R20291, two further CTns were described. RESULTS: CTn1, CTn2 CTn4, CTn5 and CTn7 were shown to excise from the genome of strain 630 and transfer to strain CD37. A putative CTn from R20291, misleadingly termed a phage island previously, was shown to excise and to contain three putative mobilisable transposons, one of which was capable of excision. In silico probing of C. difficile genome sequences with recombinase gene fragments identified new putative conjugative and mobilisable transposons related to the elements in strains 630 and R20291. CTn5-like elements were described occupying different insertion sites in different strains, CTn1-like elements that have lost the ability to excise in some ribotype 027 strains were described and one strain was shown to contain CTn5-like and CTn7-like elements arranged in tandem. Additionally, using bioinformatics, we updated previous gene annotations and predicted novel functions for the accessory gene products on these new elements. CONCLUSIONS: The genomes of the C. difficile strains examined contain highly related CTns suggesting recent horizontal gene transfer. Several elements were capable of excision and conjugative transfer. The presence of antibiotic resistance genes and genes predicted to promote adaptation to the intestinal environment suggests that CTns play a

  5. Screening in silico predicted remotely acting NF1 gene regulatory elements for mutations in patients with neurofibromatosis type 1.

    Science.gov (United States)

    Hamby, Stephen E; Reviriego, Pablo; Cooper, David N; Upadhyaya, Meena; Chuzhanova, Nadia

    2013-08-15

    Neurofibromatosis type 1 (NF1), a neuroectodermal disorder, is caused by germline mutations in the NF1 gene. NF1 affects approximately 1/3,000 individuals worldwide, with about 50% of cases representing de novo mutations. Although the NF1 gene was identified in 1990, the underlying gene mutations still remain undetected in a small but obdurate minority of NF1 patients. We postulated that in these patients, hitherto undetected pathogenic mutations might occur in regulatory elements far upstream of the NF1 gene. In an attempt to identify such remotely acting regulatory elements, we reasoned that some of them might reside within DNA sequences that (1) have the potential to interact at distance with the NF1 gene and (2) lie within a histone H3K27ac-enriched region, a characteristic of active enhancers. Combining Hi-C data, obtained by means of the chromosome conformation capture technique, with data on the location and level of histone H3K27ac enrichment upstream of the NF1 gene, we predicted in silico the presence of two remotely acting regulatory regions, located, respectively, approximately 600 kb and approximately 42 kb upstream of the NF1 gene. These regions were then sequenced in 47 NF1 patients in whom no mutations had been found in either the NF1 or SPRED1 gene regions. Five patients were found to harbour DNA sequence variants in the distal H3K27ac-enriched region. Although these variants are of uncertain pathological significance and still remain to be functionally characterized, this approach promises to be of general utility for the detection of mutations underlying other inherited disorders that may be caused by mutations in remotely acting regulatory elements.

  6. Differential responses to Wnt and PCP disruption predict expression and developmental function of conserved and novel genes in a cnidarian.

    Directory of Open Access Journals (Sweden)

    Pascal Lapébie

    2014-09-01

    Full Text Available We have used Digital Gene Expression analysis to identify, without bilaterian bias, regulators of cnidarian embryonic patterning. Transcriptome comparison between un-manipulated Clytia early gastrula embryos and ones in which the key polarity regulator Wnt3 was inhibited using morpholino antisense oligonucleotides (Wnt3-MO identified a set of significantly over and under-expressed transcripts. These code for candidate Wnt signaling modulators, orthologs of other transcription factors, secreted and transmembrane proteins known as developmental regulators in bilaterian models or previously uncharacterized, and also many cnidarian-restricted proteins. Comparisons between embryos injected with morpholinos targeting Wnt3 and its receptor Fz1 defined four transcript classes showing remarkable correlation with spatiotemporal expression profiles. Class 1 and 3 transcripts tended to show sustained expression at "oral" and "aboral" poles respectively of the developing planula larva, class 2 transcripts in cells ingressing into the endodermal region during gastrulation, while class 4 gene expression was repressed at the early gastrula stage. The preferential effect of Fz1-MO on expression of class 2 and 4 transcripts can be attributed to Planar Cell Polarity (PCP disruption, since it was closely matched by morpholino knockdown of the specific PCP protein Strabismus. We conclude that endoderm and post gastrula-specific gene expression is particularly sensitive to PCP disruption while Wnt-/β-catenin signaling dominates gene regulation along the oral-aboral axis. Phenotype analysis using morpholinos targeting a subset of transcripts indicated developmental roles consistent with expression profiles for both conserved and cnidarian-restricted genes. Overall our unbiased screen allowed systematic identification of regionally expressed genes and provided functional support for a shared eumetazoan developmental regulatory gene set with both predicted and

  7. Global state measures of the dentate gyrus gene expression system predict antidepressant-sensitive behaviors.

    Directory of Open Access Journals (Sweden)

    Benjamin A Samuels

    Full Text Available BACKGROUND: Selective serotonin reuptake inhibitors (SSRIs such as fluoxetine are the most common form of medication treatment for major depression. However, approximately 50% of depressed patients fail to achieve an effective treatment response. Understanding how gene expression systems respond to treatments may be critical for understanding antidepressant resistance. METHODS: We take a novel approach to this problem by demonstrating that the gene expression system of the dentate gyrus responds to fluoxetine (FLX, a commonly used antidepressant medication, in a stereotyped-manner involving changes in the expression levels of thousands of genes. The aggregate behavior of this large-scale systemic response was quantified with principal components analysis (PCA yielding a single quantitative measure of the global gene expression system state. RESULTS: Quantitative measures of system state were highly correlated with variability in levels of antidepressant-sensitive behaviors in a mouse model of depression treated with fluoxetine. Analysis of dorsal and ventral dentate samples in the same mice indicated that system state co-varied across these regions despite their reported functional differences. Aggregate measures of gene expression system state were very robust and remained unchanged when different microarray data processing algorithms were used and even when completely different sets of gene expression levels were used for their calculation. CONCLUSIONS: System state measures provide a robust method to quantify and relate global gene expression system state variability to behavior and treatment. State variability also suggests that the diversity of reported changes in gene expression levels in response to treatments such as fluoxetine may represent different perspectives on unified but noisy global gene expression system state level responses. Studying regulation of gene expression systems at the state level may be useful in guiding new

  8. The gene expression profile of inflammatory, hypoxic and metabolic genes predicts the metastatic spread of human head and neck squamous cell carcinoma.

    Science.gov (United States)

    Clatot, Florian; Gouérant, Sophie; Mareschal, Sylvain; Cornic, Marie; Berghian, Anca; Choussy, Olivier; El Ouakif, Faissal; François, Arnaud; Bénard, Magalie; Ruminy, Philippe; Picquenot, Jean-Michel; Jardin, Fabrice

    2014-03-01

    To assess the prognostic value of the expression profile of the main genes implicated in hypoxia, glucose and lactate metabolism, inflammation, angiogenesis and extracellular matrix interactions for the metastatic spread of head and neck squamous cell carcinoma. Using a high-throughput qRT-PCR, we performed an unsupervised clustering analysis based on the expression of 42 genes for 61 patients. Usual prognostic factors and clustering analysis results were related to metastasis free survival. With a median follow-up of 48months, 19 patients died from a metastatic evolution of their head and neck squamous cell carcinoma and one from a local recurrence. The unsupervised clustering analysis distinguished two groups of genes that were related to metastatic evolution. A capsular rupture (p=0.005) and the "cluster CXCL12 low" (p=0.002) were found to be independent prognostic factors for metastasis free survival. Using a Linear Predictive Score methodology, we established a 9-gene model (VHL, PTGER4, HK1, SLC16A4, DLL4, CXCL12, CXCR4, PTGER3 and CA9) that was capable of classifying the samples into the 2 clusters with 90% accuracy. In this cohort, our clustering analysis underlined the independent prognostic value of the expression of a panel of genes involved in hypoxia and tumor environment. It allowed us to define a 9-gene model which can be applied routinely to classify newly diagnosed head and neck squamous cell carcinoma. If confirmed by an independent prospective study, this approach may help future clinical management of these aggressive tumors. Copyright © 2013 Elsevier Ltd. All rights reserved.

  9. A two-gene signature, SKI and SLAMF1, predicts time-to-treatment in previously untreated patients with chronic lymphocytic leukemia.

    Directory of Open Access Journals (Sweden)

    Carmen D Schweighofer

    Full Text Available We developed and validated a two-gene signature that predicts prognosis in previously-untreated chronic lymphocytic leukemia (CLL patients. Using a 65 sample training set, from a cohort of 131 patients, we identified the best clinical models to predict time-to-treatment (TTT and overall survival (OS. To identify individual genes or combinations in the training set with expression related to prognosis, we cross-validated univariate and multivariate models to predict TTT. We identified four gene sets (5, 6, 12, or 13 genes to construct multivariate prognostic models. By optimizing each gene set on the training set, we constructed 11 models to predict the time from diagnosis to treatment. Each model also predicted OS and added value to the best clinical models. To determine which contributed the most value when added to clinical variables, we applied the Akaike Information Criterion. Two genes were consistently retained in the models with clinical variables: SKI (v-SKI avian sarcoma viral oncogene homolog and SLAMF1 (signaling lymphocytic activation molecule family member 1; CD150. We optimized a two-gene model and validated it on an independent test set of 66 samples. This two-gene model predicted prognosis better on the test set than any of the known predictors, including ZAP70 and serum β2-microglobulin.

  10. A Two-Gene Signature, SKI and SLAMF1, Predicts Time-to-Treatment in Previously Untreated Patients with Chronic Lymphocytic Leukemia

    Science.gov (United States)

    Schweighofer, Carmen D.; Coombes, Kevin R.; Barron, Lynn L.; Diao, Lixia; Newman, Rachel J.; Ferrajoli, Alessandra; O'Brien, Susan; Wierda, William G.; Luthra, Rajyalakshmi; Medeiros, L. Jeffrey; Keating, Michael J.; Abruzzo, Lynne V.

    2011-01-01

    We developed and validated a two-gene signature that predicts prognosis in previously-untreated chronic lymphocytic leukemia (CLL) patients. Using a 65 sample training set, from a cohort of 131 patients, we identified the best clinical models to predict time-to-treatment (TTT) and overall survival (OS). To identify individual genes or combinations in the training set with expression related to prognosis, we cross-validated univariate and multivariate models to predict TTT. We identified four gene sets (5, 6, 12, or 13 genes) to construct multivariate prognostic models. By optimizing each gene set on the training set, we constructed 11 models to predict the time from diagnosis to treatment. Each model also predicted OS and added value to the best clinical models. To determine which contributed the most value when added to clinical variables, we applied the Akaike Information Criterion. Two genes were consistently retained in the models with clinical variables: SKI (v-SKI avian sarcoma viral oncogene homolog) and SLAMF1 (signaling lymphocytic activation molecule family member 1; CD150). We optimized a two-gene model and validated it on an independent test set of 66 samples. This two-gene model predicted prognosis better on the test set than any of the known predictors, including ZAP70 and serum β2-microglobulin. PMID:22194822

  11. For better and for worse: genes and parenting interact to predict future behavior in romantic relationships.

    Science.gov (United States)

    Masarik, April S; Conger, Rand D; Donnellan, M Brent; Stallings, Michael C; Martin, Monica J; Schofield, Thomas J; Neppl, Tricia K; Scaramella, Laura V; Smolen, Andrew; Widaman, Keith F

    2014-06-01

    We tested the differential susceptibility hypothesis with respect to connections between interactions in the family of origin and subsequent behaviors with romantic partners. Focal or target participants (G2) in an ongoing longitudinal study (N = 352) were observed interacting with their parents (G1) during adolescence and again with their romantic partners in adulthood. Independent observers rated positive engagement and hostility by G1 and G2 during structured interaction tasks. We created an index for hypothesized genetic plasticity by summing G2's allelic variation for polymorphisms in 5 genes (serotonin transporter gene [linked polymorphism], 5-HTT; ankyrin repeat and kinase domain containing 1 gene/dopamine receptor D2 gene, ANKK1/DRD2; dopamine receptor D4 gene, DRD4; dopamine active transporter gene, DAT; and catechol-O-methyltransferase gene, COMT). Consistent with the differential susceptibility hypothesis, G2s exposed to more hostile and positively engaged parenting behaviors during adolescence were more hostile or positively engaged toward a romantic partner if they had higher scores on the genetic plasticity index. In short, genetic factors moderated the connection between earlier experiences in the family of origin and future romantic relationship behaviors, for better and for worse.

  12. Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions.

    Directory of Open Access Journals (Sweden)

    Soumya Raychaudhuri

    2009-06-01

    Full Text Available Translating a set of disease regions into insight about pathogenic mechanisms requires not only the ability to identify the key disease genes within them, but also the biological relationships among those key genes. Here we describe a statistical method, Gene Relationships Among Implicated Loci (GRAIL, that takes a list of disease regions and automatically assesses the degree of relatedness of implicated genes using 250,000 PubMed abstracts. We first evaluated GRAIL by assessing its ability to identify subsets of highly related genes in common pathways from validated lipid and height SNP associations from recent genome-wide studies. We then tested GRAIL, by assessing its ability to separate true disease regions from many false positive disease regions in two separate practical applications in human genetics. First, we took 74 nominally associated Crohn's disease SNPs and applied GRAIL to identify a subset of 13 SNPs with highly related genes. Of these, ten convincingly validated in follow-up genotyping; genotyping results for the remaining three were inconclusive. Next, we applied GRAIL to 165 rare deletion events seen in schizophrenia cases (less than one-third of which are contributing to disease risk. We demonstrate that GRAIL is able to identify a subset of 16 deletions containing highly related genes; many of these genes are expressed in the central nervous system and play a role in neuronal synapses. GRAIL offers a statistically robust approach to identifying functionally related genes from across multiple disease regions--that likely represent key disease pathways. An online version of this method is available for public use (http://www.broad.mit.edu/mpg/grail/.

  13. A computational method based on the integration of heterogeneous networks for predicting disease-gene associations.

    Directory of Open Access Journals (Sweden)

    Xingli Guo

    Full Text Available The identification of disease-causing genes is a fundamental challenge in human health and of great importance in improving medical care, and provides a better understanding of gene functions. Recent computational approaches based on the interactions among human proteins and disease similarities have shown their power in tackling the issue. In this paper, a novel systematic and global method that integrates two heterogeneous networks for prioritizing candidate disease-causing genes is provided, based on the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein interactions. In this method, the association score function between a query disease and a candidate gene is defined as the weighted sum of all the association scores between similar diseases and neighbouring genes. Moreover, the topological correlation of these two heterogeneous networks can be incorporated into the definition of the score function, and finally an iterative algorithm is designed for this issue. This method was tested with 10-fold cross-validation on all 1,126 diseases that have at least a known causal gene, and it ranked the correct gene as one of the top ten in 622 of all the 1,428 cases, significantly outperforming a state-of-the-art method called PRINCE. The results brought about by this method were applied to study three multi-factorial disorders: breast cancer, Alzheimer disease and diabetes mellitus type 2, and some suggestions of novel causal genes and candidate disease-causing subnetworks were provided for further investigation.

  14. A structured population modeling framework for quantifying and predicting gene expression noise in flow cytometry data.

    Science.gov (United States)

    Flores, Kevin B

    2013-07-01

    We formulated a structured population model with distributed parameters to identify mechanisms that contribute to gene expression noise in time-dependent flow cytometry data. The model was validated using cell population-level gene expression data from two experiments with synthetically engineered eukaryotic cells. Our model captures the qualitative noise features of both experiments and accurately fit the data from the first experiment. Our results suggest that cellular switching between high and low expression states and transcriptional re-initiation are important factors needed to accurately describe gene expression noise with a structured population model.

  15. Prediction of Cis-Regulatory Elements Controlling Genes Differentially Expressed by Retinal and Choroidal Vascular Endothelial Cells.

    Science.gov (United States)

    Choi, Dongseok; Appukuttan, Binoy; Binek, Sierra J; Planck, Stephen R; Stout, J Timothy; Rosenbaum, James T; Smith, Justine R

    2008-01-01

    Cultured endothelial cells of the human retina and choroid demonstrate distinct patterns of gene expression. We hypothesized that differential gene expression reflected differences in the interactions of transcription factors and respective cis-regulatory motifs(s) in these two emdothelial cell subpopulations, recognizing that motifs often exist as modules. We tested this hypothesis in silico by using TRANSFAC Professional and CisModule to identify cis-regulatory motifs and modules in genes that were differentially expressed by human retinal versus choroidal endothelial cells, as identified by analysis of a microarray data set. Motifs corresponding to eight transcription factors were significantly (p < 0.05) differentially abundant in genes that were relatively highly expressed in retinal (i.e., GCCR, HMGIY, HSF1, p53, VDR) or choroidal (i.e., E2F, YY1, ZF5) endothelial cells. Predicted cis-regulatory modules were quite different for these two groups of genes. Our findings raise the possibility of exploiting specific cis-regulatory motifs to target therapy at the ocular endothelial cells subtypes responsible for neovascular age-related macular degeneration or proliferative diabetic retinopathy.

  16. AntiSMASH 4.0 - improvements in chemistry prediction and gene cluster boundary identification

    NARCIS (Netherlands)

    Blin, Kai; Wolf, Thomas; Chevrette, Marc G.; Lu, Xiaowen; Schwalen, Christopher J.; Kautsar, Satria A.; Suarez Duran, Hernando G.; Los Santos, De Emmanuel L.C.; Kim, Hyun Uk; Nave, Mariana; Dickschat, Jeroen S.; Mitchell, Douglas A.; Shelest, Ekaterina; Breitling, Rainer; Takano, Eriko; Lee, Sang Yup; Weber, Tilmann; Medema, Marnix H.

    2017-01-01

    Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding the production

  17. PREDICTION OF THE COURSE OF OSTEOARTHROSIS FROM mTOR (MAMMALIAN TARGET OF RAPAMYCIN GENE EXPRESSION

    Directory of Open Access Journals (Sweden)

    E V Chetina

    2012-01-01

    Results. Analysis of gene expression in the outpatients with OA identified two subgroups: in one subgroup (n = 13 mTOR expression was considerably much less than that in the control group; the expression of ATG1 and p21 did not differ greatly from the control and that of caspase 3 and TNF-α was significantly higher. The other outpatients (n = 20 and all the examined patients needing endoprosthetic replacement were ascertained to have a higher gene expression of mTOR, ATG1, p21, caspase 3, and TNF-α than in the control group. Before endoprosthetic replacement, severe joint destruction in patients with OA was associated with enhanced gene expression of mTOR, ATG1, p21, and caspase 3. Conclusion. In early-stage disease, increased mTOR gene expression may serve as a prognostic marker of the severity of the disease and articular cartilage destruction.

  18. miRNA regulation of gene expression: a predictive bioinformatics analysis in the postnatally developing monkey hippocampus.

    Directory of Open Access Journals (Sweden)

    Grégoire Favre

    Full Text Available Regulation of gene expression in the postnatally developing hippocampus might contribute to the emergence of selective memory function. However, the mechanisms that underlie the co-regulation of expression of hundreds of genes in different cell types at specific ages in distinct hippocampal regions have yet to be elucidated. By performing genome-wide microarray analyses of gene expression in distinct regions of the monkey hippocampal formation during early postnatal development, we identified one particular group of genes exhibiting a down-regulation of expression, between birth and six months of age in CA1 and after one year of age in CA3, to reach expression levels observed at 6-12 years of age. Bioinformatics analyses using NCBI, miRBase, TargetScan, microRNA.org and Affymetrix tools identified a number of miRNAs capable of regulating the expression of these genes simultaneously in different cell types, i.e., in neurons, astrocytes and oligodendrocytes. Interestingly, sixty-five percent of these miRNAs are conserved across species, from rodents to humans; whereas thirty-five percent are specific to primates, including humans. In addition, we found that some genes exhibiting greater down-regulation of their expression were the predicted targets of a greater number of these miRNAs. In sum, miRNAs may play a fundamental role in the co-regulation of gene expression in different cell types. This mechanism is partially conserved across species, and may thus contribute to the similarity of basic hippocampal characteristics across mammals. This mechanism also exhibits a phylogenetic diversity that may contribute to more subtle species differences in hippocampal structure and function observed at the cellular level.

  19. A computational approach to identify predictive gene signatures in Triple Negative Breast Cancer

    OpenAIRE

    Nuzzo, Simona

    2014-01-01

    Microarray technology has been extensively used to detect patterns in gene expression that stem from regulatory interactions. Seminal studies demonstrated that the synergistic use of microarray-based techniques and bioinformatics analysis of genomic data might not only further the understanding of pathological phenotypes, but also provide lists of genes to dissect a disease into distinct groups, with different diagnostic or prognostic characteristics. Nonetheless, optimism for microarray-base...

  20. Gene expression arrays as a tool to unravel mechanisms of normal tissue radiation injury and prediction of response

    Institute of Scientific and Technical Information of China (English)

    Jacqueline JCM Kruse; Fiona A Stewart

    2007-01-01

    Over the past 5 years there has been a rapid increase in the use of microarray technology in the field of cancer research. The majority of studies use microarray analysis of tumor biopsies for profiling of molecular characteristics in an attempt to produce robust classifiers for prognosis. There are now several published gene sets that have been shown to predict for aggressive forms of breast cancer, where patients are most likely to benefit from adjuvant chemotherapy and tumors most likely to develop distant metastases, or be resistant to treatment. The number of publications relating to the use of microarrays for analysis of normal tissue damage, after cancer treatment or genotoxic exposure, is much more limited. A PubMed literature search was conducted using the following keywords and combination of terms: radiation, normal tissue, microarray, gene expression profiling, prediction. With respect to normal tissue radiation injury, microarrays have been used in three ways: (1) to generate gene signatures to identify sensitive and resistant populations (prognosis); (2) to identify sets of biomarker genes for estimating radiation exposure, either accidental or as a result of terrorist attack (diagnosis); (3) to identify genes and pathways involved in tissue response to injury (mechanistic). In this article we will review all (relevant) papers that covered our literature search criteria on microarray technology as it has been applied to normal tissue radiation biology and discuss how successful this has been in defining predisposition markers for radiation sensitivity or how it has helped us to unravel molecular mechanisms leading to acute and late tissue toxicity. We also discuss some of the problems and limitations in application and interpretation of such data.

  1. Identification of epigenetically regulated genes that predict patient outcome in neuroblastoma

    Directory of Open Access Journals (Sweden)

    Enström Camilla

    2011-02-01

    Full Text Available Abstract Background Epigenetic mechanisms such as DNA methylation and histone modifications are important regulators of gene expression and are frequently involved in silencing tumor suppressor genes. Methods In order to identify genes that are epigenetically regulated in neuroblastoma tumors, we treated four neuroblastoma cell lines with the demethylating agent 5-Aza-2'-deoxycytidine (5-Aza-dC either separately or in conjunction with the histone deacetylase inhibitor trichostatin A (TSA. Expression was analyzed using whole-genome expression arrays to identify genes activated by the treatment. These data were then combined with data from genome-wide DNA methylation arrays to identify candidate genes silenced in neuroblastoma due to DNA methylation. Results We present eight genes (KRT19, PRKCDBP, SCNN1A, POU2F2, TGFBI, COL1A2, DHRS3 and DUSP23 that are methylated in neuroblastoma, most of them not previously reported as such, some of which also distinguish between biological subsets of neuroblastoma tumors. Differential methylation was observed for the genes SCNN1A (p PRKCDBP (p KRT19 (p KRT19 and PRKCDBP was significantly lower in patients that have died from the disease compared with patients with no evidence of disease (fold change -8.3, p = 0.01 for KRT19 and fold change -2.4, p = 0.04 for PRKCDBP. Conclusions In our study, a low methylation frequency of SCNN1A, PRKCDBP and KRT19 is significantly associated with favorable outcome in neuroblastoma. It is likely that analysis of specific DNA methylation will be one of several methods in future patient therapy stratification protocols for treatment of childhood neuroblastomas.

  2. Gene Expression Profiles Can Predict Panitumumab Monotherapy Responsiveness in Human Tumor Xenograft Models

    Directory of Open Access Journals (Sweden)

    Michael J. Boedigheimer

    2013-02-01

    Conclusion A model was constructed from microarray data that prospectively predict responsiveness to panitumumab in xenograft models. This approach may help identify patients, independent of disease origin, likely to benefit from panitumumab.

  3. The importance of virulence prediction and gene networks in microbial risk assessment

    DEFF Research Database (Denmark)

    Wassenaar, Gertrude Maria; Gamieldien, Junaid; Shatkin, JoAnne

    2007-01-01

    For microbial risk assessment, it is necessary to recognize and predict Virulence of bacterial pathogens, including their ability to contaminate foods. Hazard characterization requires data on strain variability regarding virulence and survival during food processing. Moreover, information on vir...

  4. Superfamily assignments for the yeast proteome through integration of structure prediction with the gene ontology.

    Directory of Open Access Journals (Sweden)

    Lars Malmström

    2007-04-01

    Full Text Available Saccharomyces cerevisiae is one of the best-studied model organisms, yet the three-dimensional structure and molecular function of many yeast proteins remain unknown. Yeast proteins were parsed into 14,934 domains, and those lacking sequence similarity to proteins of known structure were folded using the Rosetta de novo structure prediction method on the World Community Grid. This structural data was integrated with process, component, and function annotations from the Saccharomyces Genome Database to assign yeast protein domains to SCOP superfamilies using a simple Bayesian approach. We have predicted the structure of 3,338 putative domains and assigned SCOP superfamily annotations to 581 of them. We have also assigned structural annotations to 7,094 predicted domains based on fold recognition and homology modeling methods. The domain predictions and structural information are available in an online database at http://rd.plos.org/10.1371_journal.pbio.0050076_01.

  5. Meta4: a web-application for sharing and annotating metagenomic gene predictions using web-services

    Directory of Open Access Journals (Sweden)

    Emily J Richardson

    2013-09-01

    Full Text Available Whole-genome-shotgun (WGS metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web-application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web-services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website (http://www.ark-genomics.org/bioinformatics/meta4, code is available on Github (https://github.com/mw55309/meta4, a cloud image is available, and an example implementation can be seen at http://www.ark-genomics.org/tools/meta4

  6. Gene Expression Differences Predict Treatment Outcome of Merkel Cell Carcinoma Patients

    Directory of Open Access Journals (Sweden)

    Loren Masterson

    2014-01-01

    Full Text Available Due to the rarity of Merkel cell carcinoma (MCC, prospective clinical trials have not been practical. This study aimed to identify biomarkers with prognostic significance. While sixty-two patients were identified who were treated for MCC at our institution, only seventeen patients had adequate formalin-fixed paraffin-embedded archival tissue and followup to be included in the study. Patients were stratified into good, moderate, or poor prognosis. Laser capture microdissection was used to isolate tumor cells for subsequent RNA isolation and gene expression analysis with Affymetrix GeneChip Human Exon 1.0 ST arrays. Among the 191 genes demonstrating significant differential expression between prognostic groups, keratin 20 and neurofilament protein have previously been identified in studies of MCC and were significantly upregulated in tumors from patients with a poor prognosis. Immunohistochemistry further established that keratin 20 was overexpressed in the poor prognosis tumors. In addition, novel genes of interest such as phospholipase A2 group X, kinesin family member 3A, tumor protein D52, mucin 1, and KIT were upregulated in specimens from patients with poor prognosis. Our pilot study identified several gene expression differences which could be used in the future as prognostic biomarkers in MCC patients.

  7. Predicting growth and mortality of bivalve larvae using gene expression and supervised machine learning.

    Science.gov (United States)

    Bassim, Sleiman; Chapman, Robert W; Tanguy, Arnaud; Moraga, Dario; Tremblay, Rejean

    2015-12-01

    It is commonly known that the nature of the diet has diverse consequences on larval performance and longevity, however it is still unclear which genes have critical impacts on bivalve development and which pathways are of particular importance in their vulnerability or resistance. First we show that a diet deficient in essential fatty acid (EFA) produces higher larval mortality rates, a reduced shell growth, and lower postlarval performance, all of which are positively correlated with a decline in arachidonic and eicosapentaenoic acids levels, two EFAs known as eicosanoid precursors. Eicosanoids affect the cell inflammatory reactions and are synthesized from long-chain EFAs. Second, we show for the first time that a deficiency in eicosanoid precursors is associated with a network of 29 genes. Their differential regulation can lead to slower growth and higher mortality of Mytilus edulis larvae. Some of these genes are specific to bivalves and others are implicated at the same time in lipid metabolism and defense. Several genes are expressed only during pre-metamorphosis where they are essential for muscle or neurone development and biomineralization, but only in stress-induced larvae. Finally, we discuss how our networks of differentially expressed genes might dynamically alter the development of marine bivalves, especially under dietary influence.

  8. Gene expression signature in organized and growth arrested mammaryacini predicts good outcome in breast cancer

    Energy Technology Data Exchange (ETDEWEB)

    Fournier, Marcia V.; Martin, Katherine J.; Kenny, Paraic A.; Xhaja, Kris; Bosch, Irene; Yaswen, Paul; Bissell, Mina J.

    2006-02-08

    To understand how non-malignant human mammary epithelial cells (HMEC) transit from a disorganized proliferating to an organized growth arrested state, and to relate this process to the changes that occur in breast cancer, we studied gene expression changes in non-malignant HMEC grown in three-dimensional cultures, and in a previously published panel of microarray data for 295 breast cancer samples. We hypothesized that the gene expression pattern of organized and growth arrested mammary acini would share similarities with breast tumors with good prognoses. Using Affymetrix HG-U133A microarrays, we analyzed the expression of 22,283 gene transcripts in two HMEC cell lines, 184 (finite life span) and HMT3522 S1 (immortal non-malignant), on successive days post-seeding in a laminin-rich extracellular matrix assay. Both HMECs underwent growth arrest in G0/G1 and differentiated into polarized acini between days 5 and 7. We identified gene expression changes with the same temporal pattern in both lines. We show that genes that are significantly lower in the organized, growth arrested HMEC than in their proliferating counterparts can be used to classify breast cancer patients into poor and good prognosis groups with high accuracy. This study represents a novel unsupervised approach to identifying breast cancer markers that may be of use clinically.

  9. Influence of mRNA decay rates on the computational prediction of transcription rate profiles from gene expression profiles

    Indian Academy of Sciences (India)

    Chi-Fang Chin; Arthur Chun-Chieh Shih; Kuo-Chin Fan

    2007-12-01

    The abundance of an mRNA species depends not only on the transcription rate at which it is produced, but also on its decay rate, which determines how quickly it is degraded. Both transcription rate and decay rate are important factors in regulating gene expression. With the advance of the age of genomics, there are a considerable number of gene expression datasets, in which the expression profiles of tens of thousands of genes are often non-uniformly sampled. Recently, numerous studies have proposed to infer the regulatory networks from expression profiles. Nevertheless, how mRNA decay rates affect the computational prediction of transcription rate profiles from expression profiles has not been well studied. To understand the influences, we present a systematic method based on a gene dynamic regulation model by taking mRNA decay rates, expression profiles and transcription profiles into account. Generally speaking, an expression profile can be regarded as a representation of a biological condition. The rationale behind the concept is that the biological condition is reflected in the changing of gene expression profile. Basically, the biological condition is either associated to the cell cycle or associated to the environmental stresses. The expression profiles of genes that belong to the former, so-called cell cycle data, are characterized by periodicity, whereas the expression profiles of genes that belong to the latter, so-called condition-specific data, are characterized by a steep change after a specific time without periodicity. In this paper, we examine the systematic method on the simulated expression data as well as the real expression data including yeast cell cycle data and condition-specific data (glucose-limitation data). The results indicate that mRNA decay rates do not significantly influence the computational prediction of transcription-rate profiles for cell cycle data. On the contrary, the magnitudes and shapes of transcription-rate profiles for

  10. Stem cell-like gene expression in ovarian cancer predicts type II subtype and prognosis.

    Directory of Open Access Journals (Sweden)

    Matthew Schwede

    Full Text Available Although ovarian cancer is often initially chemotherapy-sensitive, the vast majority of tumors eventually relapse and patients die of increasingly aggressive disease. Cancer stem cells are believed to have properties that allow them to survive therapy and may drive recurrent tumor growth. Cancer stem cells or cancer-initiating cells are a rare cell population and difficult to isolate experimentally. Genes that are expressed by stem cells may characterize a subset of less differentiated tumors and aid in prognostic classification of ovarian cancer. The purpose of this study was the genomic identification and characterization of a subtype of ovarian cancer that has stem cell-like gene expression. Using human and mouse gene signatures of embryonic, adult, or cancer stem cells, we performed an unsupervised bipartition class discovery on expression profiles from 145 serous ovarian tumors to identify a stem-like and more differentiated subgroup. Subtypes were reproducible and were further characterized in four independent, heterogeneous ovarian cancer datasets. We identified a stem-like subtype characterized by a 51-gene signature, which is significantly enriched in tumors with properties of Type II ovarian cancer; high grade, serous tumors, and poor survival. Conversely, the differentiated tumors share properties with Type I, including lower grade and mixed histological subtypes. The stem cell-like signature was prognostic within high-stage serous ovarian cancer, classifying a small subset of high-stage tumors with better prognosis, in the differentiated subtype. In multivariate models that adjusted for common clinical factors (including grade, stage, age, the subtype classification was still a significant predictor of relapse. The prognostic stem-like gene signature yields new insights into prognostic differences in ovarian cancer, provides a genomic context for defining Type I/II subtypes, and potential gene targets which following further

  11. Germline and somatic mutations in homologous recombination genes predict platinum response and survival in ovarian, fallopian tube, and peritoneal carcinomas.

    Science.gov (United States)

    Pennington, Kathryn P; Walsh, Tom; Harrell, Maria I; Lee, Ming K; Pennil, Christopher C; Rendi, Mara H; Thornton, Anne; Norquist, Barbara M; Casadei, Silvia; Nord, Alexander S; Agnew, Kathy J; Pritchard, Colin C; Scroggins, Sheena; Garcia, Rochelle L; King, Mary-Claire; Swisher, Elizabeth M

    2014-02-01

    Hallmarks of germline BRCA1/2-associated ovarian carcinomas include chemosensitivity and improved survival. The therapeutic impact of somatic BRCA1/2 mutations and mutations in other homologous recombination DNA repair genes is uncertain. Using targeted capture and massively parallel genomic sequencing, we assessed 390 ovarian carcinomas for germline and somatic loss-of-function mutations in 30 genes, including BRCA1, BRCA2, and 11 other genes in the homologous recombination pathway. Thirty-one percent of ovarian carcinomas had a deleterious germline (24%) and/or somatic (9%) mutation in one or more of the 13 homologous recombination genes: BRCA1, BRCA2, ATM, BARD1, BRIP1, CHEK1, CHEK2, FAM175A, MRE11A, NBN, PALB2, RAD51C, and RAD51D. Nonserous ovarian carcinomas had similar rates of homologous recombination mutations to serous carcinomas (28% vs. 31%, P = 0.6), including clear cell, endometrioid, and carcinosarcoma. The presence of germline and somatic homologous recombination mutations was highly predictive of primary platinum sensitivity (P = 0.0002) and improved overall survival (P = 0.0006), with a median overall survival of 66 months in germline homologous recombination mutation carriers, 59 months in cases with a somatic homologous recombination mutation, and 41 months for cases without a homologous recombination mutation. Germline or somatic mutations in homologous recombination genes are present in almost one third of ovarian carcinomas, including both serous and nonserous histologies. Somatic BRCA1/2 mutations and mutations in other homologous recombination genes have a similar positive impact on overall survival and platinum responsiveness as germline BRCA1/2 mutations. The similar rate of homologous recombination mutations in nonserous carcinomas supports their inclusion in PARP inhibitor clinical trials. ©2013 AACR.

  12. Identification of Gene Networks for Residual Feed Intake in Angus Cattle Using Genomic Prediction and RNA-seq.

    Directory of Open Access Journals (Sweden)

    Kristina L Weber

    Full Text Available Improvement in feed conversion efficiency can improve the sustainability of beef cattle production, but genomic selection for feed efficiency affects many underlying molecular networks and physiological traits. This study describes the differences between steer progeny of two influential Angus bulls with divergent genomic predictions for residual feed intake (RFI. Eight steer progeny of each sire were phenotyped for growth and feed intake from 8 mo. of age (average BW 254 kg, with a mean difference between sire groups of 4.8 kg until slaughter at 14-16 mo. of age (average BW 534 kg, sire group difference of 28.8 kg. Terminal samples from pituitary gland, skeletal muscle, liver, adipose, and duodenum were collected from each steer for transcriptome sequencing. Gene expression networks were derived using partial correlation and information theory (PCIT, including differentially expressed (DE genes, tissue specific (TS genes, transcription factors (TF, and genes associated with RFI from a genome-wide association study (GWAS. Relative to progeny of the high RFI sire, progeny of the low RFI sire had -0.56 kg/d finishing period RFI (P = 0.05, -1.08 finishing period feed conversion ratio (P = 0.01, +3.3 kg^0.75 finishing period metabolic mid-weight (MMW; P = 0.04, +28.8 kg final body weight (P = 0.01, -12.9 feed bunk visits per day (P = 0.02 with +0.60 min/visit duration (P = 0.01, and +0.0045 carcass specific gravity (weight in air/weight in air-weight in water, a predictor of carcass fat content; P = 0.03. RNA-seq identified 633 DE genes between sire groups among 17,016 expressed genes. PCIT analysis identified >115,000 significant co-expression correlations between genes and 25 TF hubs, i.e. controllers of clusters of DE, TS, and GWAS SNP genes. Pathway analysis suggests low RFI bull progeny possess heightened gut inflammation and reduced fat deposition. This multi-omics analysis shows how differences in RFI genomic breeding values can impact other

  13. Identification of Gene Networks for Residual Feed Intake in Angus Cattle Using Genomic Prediction and RNA-seq.

    Science.gov (United States)

    Weber, Kristina L; Welly, Bryan T; Van Eenennaam, Alison L; Young, Amy E; Porto-Neto, Laercio R; Reverter, Antonio; Rincon, Gonzalo

    2016-01-01

    Improvement in feed conversion efficiency can improve the sustainability of beef cattle production, but genomic selection for feed efficiency affects many underlying molecular networks and physiological traits. This study describes the differences between steer progeny of two influential Angus bulls with divergent genomic predictions for residual feed intake (RFI). Eight steer progeny of each sire were phenotyped for growth and feed intake from 8 mo. of age (average BW 254 kg, with a mean difference between sire groups of 4.8 kg) until slaughter at 14-16 mo. of age (average BW 534 kg, sire group difference of 28.8 kg). Terminal samples from pituitary gland, skeletal muscle, liver, adipose, and duodenum were collected from each steer for transcriptome sequencing. Gene expression networks were derived using partial correlation and information theory (PCIT), including differentially expressed (DE) genes, tissue specific (TS) genes, transcription factors (TF), and genes associated with RFI from a genome-wide association study (GWAS). Relative to progeny of the high RFI sire, progeny of the low RFI sire had -0.56 kg/d finishing period RFI (P = 0.05), -1.08 finishing period feed conversion ratio (P = 0.01), +3.3 kg^0.75 finishing period metabolic mid-weight (MMW; P = 0.04), +28.8 kg final body weight (P = 0.01), -12.9 feed bunk visits per day (P = 0.02) with +0.60 min/visit duration (P = 0.01), and +0.0045 carcass specific gravity (weight in air/weight in air-weight in water, a predictor of carcass fat content; P = 0.03). RNA-seq identified 633 DE genes between sire groups among 17,016 expressed genes. PCIT analysis identified >115,000 significant co-expression correlations between genes and 25 TF hubs, i.e. controllers of clusters of DE, TS, and GWAS SNP genes. Pathway analysis suggests low RFI bull progeny possess heightened gut inflammation and reduced fat deposition. This multi-omics analysis shows how differences in RFI genomic breeding values can impact other

  14. A Global Genomic and Genetic Strategy to Identify, Validate and Use Gene Signatures of Xenobiotic-Responsive Transcription Factors in Prediction of Pathway Activation in the Mouse Liver

    Science.gov (United States)

    Many drugs and environmentally-relevant chemicals activate xenobiotic-responsive transcription factors. Identification of target genes of these factors would be useful in predicting pathway activation in in vitro chemical screening as well as their involvement in disease states. ...

  15. Prediction of G gene epitopes of viral hemorrhagic septicemia virus and eukaryotic expression of major antigen determinant sequence.

    Science.gov (United States)

    Sun, T; Yin, W-L; Fang, B-H; Wang, Q; Liang, C-Z; Yue, Z-Q

    2017-08-15

    This study aims to express fish Viral hemorrhagic septicemia virus (VHSV) G main antigen domain by using Bac-to-bac expression system. Using bioinformatics tools, B cell epitope of VHSV G gene was predicted, and G main antigen domain was optimized. GM gene was inserted into pFastBac1 vector, then transferred recombinant plasmid into DH10Bac to get recombinant rBacmid-GM. Obtained shuttle plasmid rBacmid-GM was transfected into sf9 cells. GM expression was examined using by PCR and western-blot. Results indicated that G main antigen domain gene of VHSV was successfully cloned and sequenced which contains 1209 bp. PCR proved that shuttle plasmid rBacmid-GM was constructed correctly. SDS-PAGE electrophoresis analysis detected a band of protein about 45kD in expression product of G gene. Obtained recombinant G protein reacted with VHSV-positive serum that was substantiated by western-blot analysis. In conclusion, the main antigen domain of VHSV G was successfully expressed in the Bac-to-Bac baculovirus system.

  16. Variation in key genes of serotonin and norepinephrine function predicts gamma-band activity during goal-directed attention.

    Science.gov (United States)

    Enge, Sören; Fleischhauer, Monika; Lesch, Klaus-Peter; Reif, Andreas; Strobel, Alexander

    2014-05-01

    Recent evidence shows that genetic variations in key regulators of serotonergic (5-HT) signaling explain variance in executive tasks, which suggests modulatory actions of 5-HT on goal-directed selective attention as one possible underlying mechanism. To investigate this link, 130 volunteers were genotyped for the 5-HT transporter gene-linked polymorphic region (5-HTTLPR) and for a variation (TPH2-703 G/T) of the TPH2 gene coding for the rate-limiting enzyme of 5-HT synthesis in the brain. Additionally, a functional polymorphism of the norepinephrine transporter gene (NET -3081 A/T) was considered, which was recently found to predict attention and working memory processes in interaction with serotonergic genes. The flanker-based Attention Network Test was used to assess goal-directed attention and the efficiency of attentional networks. Event-related gamma-band activity served to indicate selective attention at the intermediate phenotype level. The main findings were that 5-HTTLPR s allele and TPH2 G-allele homozygotes showed increased induced gamma-band activity during target processing when combined with the NET A/A genotype compared with other genotype combinations, and that gamma activity mediates the genotype-specific effects on task performance. The results further support a modulatory role of 5-HT and NE function in the top-down attentional selection of motivationally relevant over competing or irrelevant sensory input.

  17. The Structure, Expression, and Function Prediction of DAZAP2, A Down-Regulated Gene in Multiple Myeloma

    Institute of Scientific and Technical Information of China (English)

    Yiwu Shi; Saiqun Luo; Jianbin Peng; Chenghan Huang; Daren Tan; Weixin Hu

    2004-01-01

    In our previous studies, DAZAP2 gene expression was down-regulated in untreated patients of multiple myeloma (MM). For better studying the structure and function of DAZAP2, a full-length Cdna was isolated from mononuclear cells of a normal human bone marrow, sequenced and deposited to Genbank (AY430097). This sequence has an identical ORF (open reading frame) as the NM_014764 from human testis and the D31767 from human cell line KG-1. Phylogenetic analysis and structure prediction reveal that DAZAP2 homologues are highly conserved throughout evolution and share a polyproline region and several potential SH2/SH3 binding sites. DAZAP2 occurs as a single-copy gene with a four-exon organization. We further noticed that the functional DAZAP2 gene is located on Chromosome 12 and its pseudogene gene is on Chromosome 2 with electronic location of human chromosome in Genbank, though no genetic abnormalities of MM have been reported on Chromosome 12. The ORF of human DAZAP2 encodes a 17-kDa protein, which is highly similar to mouse Prtb. The DAZAP2 protein is mainly localized in cytoplasm with a discrete pattern of punctuated distribution. DAZAP2 may associate with carcinogenesis of MM and participate in yet-to-be identified signaling pathways to regulate proliferation and differentiation of plasma cells.

  18. Exome mutation burden predicts clinical outcome in ovarian cancer carrying mutated BRCA1 and BRCA2 genes

    DEFF Research Database (Denmark)

    Birkbak, Nicolai Juul; Kochupurakkal, Bose; Gonzalez-Izarzugaza, Jose Maria;

    2013-01-01

    Reliable biomarkers predicting resistance or sensitivity to anti-cancer therapy are critical for oncologists to select proper therapeutic drugs in individual cancer patients. Ovarian and breast cancer patients carrying germline mutations in BRCA1 or BRCA2 genes are often sensitive to DNA damaging...... drugs and relative to non-mutation carriers present a favorable clinical outcome following therapy. Genome sequencing studies have shown a high number of mutations in the tumor genome in patients carrying BRCA1 or BRCA2 mutations (mBRCA). The present study used exome-sequencing and SNP 6 array data...... had either germlines or somatic mutations of BRCA1 or BRCA2 genes. The results revealed that the Nmut was significantly lower in the chemotherapy-resistant mBRCA HGSOC defined by progression within 6 months after completion of first line platinum-based chemotherapy. We found a significant association...

  19. Gene signatures derived from a c-MET-driven liver cancer mouse model predict survival of patients with hepatocellular carcinoma.

    Directory of Open Access Journals (Sweden)

    Irena Ivanovska

    Full Text Available Biomarkers derived from gene expression profiling data may have a high false-positive rate and must be rigorously validated using independent clinical data sets, which are not always available. Although animal model systems could provide alternative data sets to formulate hypotheses and limit the number of signatures to be tested in clinical samples, the predictive power of such an approach is not yet proven. The present study aims to analyze the molecular signatures of liver cancer in a c-MET-transgenic mouse model and investigate its prognostic relevance to human hepatocellular carcinoma (HCC. Tissue samples were obtained from tumor (TU, adjacent non-tumor (AN and distant normal (DN liver in Tet-operator regulated (TRE human c-MET transgenic mice (n = 21 as well as from a Chinese cohort of 272 HBV- and 9 HCV-associated HCC patients. Whole genome microarray expression profiling was conducted in Affymetrix gene expression chips, and prognostic significances of gene expression signatures were evaluated across the two species. Our data revealed parallels between mouse and human liver tumors, including down-regulation of metabolic pathways and up-regulation of cell cycle processes. The mouse tumors were most similar to a subset of patient samples characterized by activation of the Wnt pathway, but distinctive in the p53 pathway signals. Of potential clinical utility, we identified a set of genes that were down regulated in both mouse tumors and human HCC having significant predictive power on overall and disease-free survival, which were highly enriched for metabolic functions. In conclusions, this study provides evidence that a disease model can serve as a possible platform for generating hypotheses to be tested in human tissues and highlights an efficient method for generating biomarker signatures before extensive clinical trials have been initiated.

  20. In silico prediction and characterization of secondary metabolite biosynthetic gene clusters in the wheat pathogen Zymoseptoria tritici.

    Science.gov (United States)

    Cairns, Timothy; Meyer, Vera

    2017-08-17

    Fungal pathogens of plants produce diverse repertoires of secondary metabolites, which have functions ranging from iron acquisition, defense against immune perturbation, to toxic assaults on the host. The wheat pathogen Zymoseptoria tritici causes Septoria tritici blotch, a foliar disease which is a significant threat to global food security. Currently, there is limited knowledge of the secondary metabolite arsenal produced by Z. tritici, which significantly restricts mechanistic understanding of infection. In this study, we analyzed the genome of Z. tritici isolate IP0323 to identify putative secondary metabolite biosynthetic gene clusters, and used comparative genomics to predict their encoded products. We identified 32 putative secondary metabolite clusters. These were physically enriched at subtelomeric regions, which may facilitate diversification of cognate products by rapid gene rearrangement or mutations. Comparative genomics revealed a four gene cluster with significant similarity to the ferrichrome-A biosynthetic locus of the maize pathogen Ustilago maydis, suggesting this siderophore is deployed by Z. tritici to acquire iron. The Z. tritici genome also contains several isoprenoid biosynthetic gene clusters, including one with high similarity to a carotenoid/opsin producing locus in several fungi. Furthermore, we identify putative phytotoxin biosynthetic clusters, suggesting Z. tritici can produce an epipolythiodioxopiperazine, and a polyketide and non-ribosomal peptide with predicted structural similarities to fumonisin and the Alternaria alternata AM-toxin, respectively. Interrogation of an existing transcriptional dataset suggests stage specific deployment of numerous predicted loci during infection, indicating an important role of these secondary metabolites in Z. tritici disease. We were able to assign putative biosynthetic products to numerous clusters based on conservation amongst other fungi. However, analysis of the majority of secondary

  1. Predicting childhood effortful control from interactions between early parenting quality and children’s dopamine transporter gene haplotypes

    OpenAIRE

    2015-01-01

    Children’s observed effortful control (EC) at 30, 42, and 54 months (n = 145) was predicted from the interaction between mothers’ observed parenting with their 30-month-olds and three variants of the solute carrier family C6, member 3 (SLC6A3) dopamine transporter gene (single nucleotide polymorphisms in intron8 and intron13, and a 40 base pair variable number tandem repeat [VNTR] in the 3′-untranslated region [UTR]), as well as haplotypes of these variants. Significant moderating effects wer...

  2. A statistical approach towards the derivation of predictive gene sets for potency ranking of chemicals in the mouse embryonic stem cell test.

    Science.gov (United States)

    Schulpen, Sjors H W; Pennings, Jeroen L A; Tonk, Elisa C M; Piersma, Aldert H

    2014-03-21

    The embryonic stem cell test (EST) is applied as a model system for detection of embryotoxicants. The application of transcriptomics allows a more detailed effect assessment compared to the morphological endpoint. Genes involved in cell differentiation, modulated by chemical exposures, may be useful as biomarkers of developmental toxicity. We describe a statistical approach to obtain a predictive gene set for toxicity potency ranking of compounds within one class. This resulted in a gene set based on differential gene expression across concentration-response series of phthalatic monoesters. We determined the concentration at which gene expression was changed at least 1.5-fold. Genes responding with the same potency ranking in vitro and in vivo embryotoxicity were selected. A leave-one-out cross-validation showed that the relative potency of each phthalate was always predicted correctly. The classical morphological 50% effect level (ID50) in EST was similar to the predicted concentration using gene set expression responses. A general down-regulation of development-related genes and up-regulation of cell-cycle related genes was observed, reminiscent of the differentiation inhibition in EST. This study illustrates the feasibility of applying dedicated gene set selections as biomarkers for developmental toxicity potency ranking on the basis of in vitro testing in the EST.

  3. Large-Scale Orthology Predictions for Inferring Gene Functions Across Multiple Species

    Science.gov (United States)

    2010-06-01

    mutations associated with cancer by studying their corresponding orthologous genes in mice (Denny, 2000). Moreover, the identification of orthologous...taxonomic profiling of eukaryotic model organisms: a comparison of orthologous proteins encoded by the human, fly, nematode , and yeast genomes.” Genome

  4. Modularity in the gain and loss of genes: applications for function prediction

    NARCIS (Netherlands)

    Ettema, T.J.G.; Oost, van der J.; Huynen, M.

    2001-01-01

    Genes that are clustered on multiple genomes and are likely to functionally interact tend to be gained or lost together during genome evolution. Here, we demonstrate that exceptions to this pattern indicate relatively distant functional interactions between the encoded proteins. Hence, this can be u

  5. Dopamine Receptor D4 Gene Variation Predicts Preschoolers' Developing Theory of Mind

    Science.gov (United States)

    Lackner, Christine; Sabbagh, Mark A.; Hallinan, Elizabeth; Liu, Xudong; Holden, Jeanette J. A.

    2012-01-01

    Individual differences in preschoolers' understanding that human action is caused by internal mental states, or representational theory of mind (RTM), are heritable, as are developmental disorders such as autism in which RTM is particularly impaired. We investigated whether polymorphisms of genes affecting dopamine (DA) utilization and metabolism…

  6. Antimicrobial susceptibility testing in predicting the presence of carbapenemase genes in Enterobacteriaceae in South Africa.

    Science.gov (United States)

    Singh-Moodley, Ashika; Perovic, Olga

    2016-10-04

    Carbapenem-resistant Enterobacteriaceae (CRE) is a concern in South Africa and worldwide. It is therefore important that these organisms be accurately identified for infection prevention control purposes. In this study 1193 suspected CREs from 46 laboratories from seven provinces in South Africa were assessed to confirm the prevalence of carbapenemase genes from our referral diagnostic isolates for the period 2012 to 2015. We compared the antimicrobial susceptibility testing method used in the reference laboratory to the polymerase chain reaction (PCR) which is used as the gold standard. Organism identification and antimicrobial susceptibility testing were performed using automated systems and DNA was extracted using a crude boiling method. The presence of carbapenemase-producing genes (bla NDM, bla KPC, bla OXA-48&variants, bla GES, bla IMP and bla VIM) was screened for using a multiplex real-time PCR. Sixty-eight percent (n = 812) of the isolates harboured a carbapenemase-producing gene; the three most common genes included: bla NDM, bla OXA-48&variants and bla VIM. Majority of the carbapenemase producing Enterobacteriaceae (CPE) isolates were Klebsiella species (71 %). The Microscan® Walkaway system used for the screening of carbapenemase production was 98 % sensitive with a minimal inhibitory concentration (MIC) breakpoint of less than 0.5 as susceptible for ertapenem and a low specificity (13 %). From this study we can conclude that carbapenemase-producing Enterobacteriaceae is increasing in South Africa and the use of phenotypic methods for detection of CPEs showed good sensitivity but lacked specificity.

  7. Dopamine Receptor D4 Gene Variation Predicts Preschoolers' Developing Theory of Mind

    Science.gov (United States)

    Lackner, Christine; Sabbagh, Mark A.; Hallinan, Elizabeth; Liu, Xudong; Holden, Jeanette J. A.

    2012-01-01

    Individual differences in preschoolers' understanding that human action is caused by internal mental states, or representational theory of mind (RTM), are heritable, as are developmental disorders such as autism in which RTM is particularly impaired. We investigated whether polymorphisms of genes affecting dopamine (DA) utilization and metabolism…

  8. DNMT3B gene amplification predicts resistance to DNA demethylating drugs.

    Science.gov (United States)

    Simó-Riudalbas, Laia; Melo, Sónia A; Esteller, Manel

    2011-07-01

    Disruption of the DNA methylation landscape is one of the most common features of human tumors. However, genetic alterations of DNA methyltransferases (DNMTs) have not been described in carcinogenesis. Herein, we show that pancreatic and breast cancer cells undergo gene amplification of the DNA methyltransferase 3B (DNMT3B). The presence of extra copies of the DNMT3B gene is linked to higher levels of the corresponding mRNA and protein. Most importantly, the elevated gene dosage of DNMT3B is associated with increased resistance to the growth-inhibitory effect mediated by DNA demethylating agents. In particular, cancer cells harboring DNMT3B gene amplification are less sensitive to the decrease in cell viability caused by 5-azacytidine (Vidaza), 5-aza-2-deoxycytidine (Decitabine), and SGI-1027. Overall, the data confirm DNMT3B as a bona fide oncogene in human cancer and support the incorporation of the DNMT3B copy number assay into current clinical trials assessing the efficacy of DNA demethylating drugs in solid tumors.

  9. REST mediates androgen receptor actions on gene repression and predicts early recurrence of prostate cancer

    DEFF Research Database (Denmark)

    Svensson, Charlotte; Ceder, Jens; Iglesias Gato, Diego

    2014-01-01

    The androgen receptor (AR) is a key regulator of prostate tumorgenesis through actions that are not fully understood. We identified the repressor element (RE)-1 silencing transcription factor (REST) as a mediator of AR actions on gene repression. Chromatin immunoprecipitation showed that AR binds...

  10. Gene expression array analyses predict increased proto-oncogene expression in MMTV induced mammary tumors.

    Science.gov (United States)

    Popken-Harris, Pamela; Kirchhof, Nicole; Harrison, Ben; Harris, Lester F

    2006-08-01

    Exogenous infection by milk-borne mouse mammary tumor viruses (MMTV) typically induce mouse mammary tumors in genetically susceptible mice at a rate of 90-95% by 1 year of age. In contrast to other transforming retroviruses, MMTV acts as an insertional mutagen and under the influence of steroid hormones induces oncogenic transformation after insertion into the host genome. As these events correspond with increases in adjacent proto-oncogene transcription, we used expression array profiling to determine which commonly associated MMTV insertion site proto-oncogenes were transcriptionally active in MMTV induced mouse mammary tumors. To verify our gene expression array results we developed real-time quantitative RT-PCR assays for the common MMTV insertion site genes found in RIII/Sa mice (int-1/wnt-1, int-2/fgf-3, int-3/Notch 4, and fgf8/AIGF) as well as two genes that were consistently up regulated (CCND1, and MAT-8) and two genes that were consistently down regulated (FN1 and MAT-8) in the MMTV induced tumors as compared to normal mammary gland. Finally, each tumor was also examined histopathologically. Our expression array findings support a model whereby just one or a few common MMTV insertions into the host genome sets up a dominant cascade of events that leave a characteristic molecular signature.

  11. Risk alleles of USF1 gene predict cardiovascular disease of women in two prospective studies.

    Directory of Open Access Journals (Sweden)

    2006-05-01

    Full Text Available Upstream transcription factor 1 (USF1 is a ubiquitously expressed transcription factor controlling several critical genes in lipid and glucose metabolism. Of some 40 genes regulated by USF1, several are involved in the molecular pathogenesis of cardiovascular disease (CVD. Although the USF1 gene has been shown to have a critical role in the etiology of familial combined hyperlipidemia, which predisposes to early CVD, the gene's potential role as a risk factor for CVD events at the population level has not been established. Here we report the results from a prospective genetic-epidemiological study of the association between the USF1 variants, CVD, and mortality in two large Finnish cohorts. Haplotype-tagging single nucleotide polymorphisms exposing all common allelic variants of USF1 were genotyped in a prospective case-cohort design with two distinct cohorts followed up during 1992-2001 and 1997-2003. The total number of follow-up years was 112,435 in 14,140 individuals, of which 2,225 were selected for genotyping based on the case-cohort study strategy. After adjustment for conventional risk factors, we observed an association of USF1 with CVD and mortality among females. In combined analysis of the two cohorts, female carriers of a USF1 risk haplotype had a 2-fold risk of a CVD event (hazard ratio [HR] 2.02; 95% confidence interval [CI] 1.16-3.53; p = 0.01 and an increased risk of all-cause mortality (HR 2.52; 95% CI 1.46-4.35; p = 0.0009. A putative protective haplotype of USF1 was also identified. Our study shows how a gene identified in exceptional families proves to be important also at the population level, implying that allelic variants of USF1 significantly influence the prospective risk of CVD and even all-cause mortality in females.

  12. An Individual-Based Diploid Model Predicts Limited Conditions Under Which Stochastic Gene Expression Becomes Advantageous

    KAUST Repository

    Matsumoto, Tomotaka

    2015-11-24

    Recent studies suggest the existence of a stochasticity in gene expression (SGE) in many organisms, and its non-negligible effect on their phenotype and fitness. To date, however, how SGE affects the key parameters of population genetics are not well understood. SGE can increase the phenotypic variation and act as a load for individuals, if they are at the adaptive optimum in a stable environment. On the other hand, part of the phenotypic variation caused by SGE might become advantageous if individuals at the adaptive optimum become genetically less-adaptive, for example due to an environmental change. Furthermore, SGE of unimportant genes might have little or no fitness consequences. Thus, SGE can be advantageous, disadvantageous, or selectively neutral depending on its context. In addition, there might be a genetic basis that regulates magnitude of SGE, which is often referred to as “modifier genes,” but little is known about the conditions under which such an SGE-modifier gene evolves. In the present study, we conducted individual-based computer simulations to examine these conditions in a diploid model. In the simulations, we considered a single locus that determines organismal fitness for simplicity, and that SGE on the locus creates fitness variation in a stochastic manner. We also considered another locus that modifies the magnitude of SGE. Our results suggested that SGE was always deleterious in stable environments and increased the fixation probability of deleterious mutations in this model. Even under frequently changing environmental conditions, only very strong natural selection made SGE adaptive. These results suggest that the evolution of SGE-modifier genes requires strict balance among the strength of natural selection, magnitude of SGE, and frequency of environmental changes. However, the degree of dominance affected the condition under which SGE becomes advantageous, indicating a better opportunity for the evolution of SGE in different genetic

  13. Prediction of the damage-associated non-synonymous single nucleotide polymorphisms in the human MC1R gene.

    Directory of Open Access Journals (Sweden)

    Diego Hepp

    Full Text Available The melanocortin 1 receptor (MC1R is involved in the control of melanogenesis. Polymorphisms in this gene have been associated with variation in skin and hair color and with elevated risk for the development of melanoma. Here we used 11 computational tools based on different approaches to predict the damage-associated non-synonymous single nucleotide polymorphisms (nsSNPs in the coding region of the human MC1R gene. Among the 92 nsSNPs arranged according to the predictions 62% were classified as damaging in more than five tools. The classification was significantly correlated with the scores of two consensus programs. Alleles associated with the red hair color (RHC phenotype and with the risk of melanoma were examined. The R variants D84E, R142H, R151C, I155T, R160W and D294H were classified as damaging by the majority of the tools while the r variants V60L, V92M and R163Q have been predicted as neutral in most of the programs The combination of the prediction tools results in 14 nsSNPs indicated as the most damaging mutations in MC1R (L48P, R67W, H70Y, P72L, S83P, R151H, S172I, L206P, T242I, G255R, P256S, C273Y, C289R and R306H; C273Y showed to be highly damaging in SIFT, Polyphen-2, MutPred, PANTHER and PROVEAN scores. The computational analysis proved capable of identifying the potentially damaging nsSNPs in MC1R, which are candidates for further laboratory studies of the functional and pharmacological significance of the alterations in the receptor and the phenotypic outcomes.

  14. Cheminformatics Approach to Gene Silencing: Z Descriptors of Nucleotides and SVM Regression Afford Predictive Models for siRNA Potency.

    Science.gov (United States)

    Ebalunode, Jerry O; Zheng, Weifan

    2010-12-17

    Short interfering RNA mediated gene silencing technology has been through tremendous development over the past decade, and has found broad applications in both basic biomedical research and pharmaceutical development. Critical to the effective use of this technology is the development of reliable algorithms to predict the potency and selectivity of siRNAs under study. Existing algorithms are mostly built upon sequence information of siRNAs and then employ statistical pattern recognition or machine learning techniques to derive rules or models. However, sequence-based features have limited ability to characterize siRNAs, especially chemically modified ones. In this study, we proposed a cheminformatics approach to describe siRNAs. Principal component scores (z1, z2, z3, z4) have been derived for each of the 5 nucleotides (A, U, G, C, T) from the descriptor matrix computed by the MOE program. Descriptors of a given siRNA sequence are simply the concatenation of the z values of its composing nucleotides. Thus, for each of the 2431 siRNA sequences in the Huesken dataset, 76 descriptors were generated for the 19-NT representation, and 84 descriptors were generated for the 21-NT representation of siRNAs. Support Vector Machine regression (SVMR) was employed to develop predictive models. In all cases, the models achieved Pearson correlation coefficient r and R about 0.84 and 0.65 for the training sets and test sets, respectively. A minimum of 25 % of the whole dataset was needed to obtain predictive models that could accurately predict 75 % of the remaining siRNAs. Thus, for the first time, a cheminformatics approach has been developed to successfully model the structure-potency relationship in siRNA-based gene silencing data, which has laid a solid foundation for quantitative modeling of chemically modified siRNAs.

  15. Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function

    Energy Technology Data Exchange (ETDEWEB)

    Xi, T; Jones, I M; Mohrenweiser, H W

    2003-11-03

    Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of the variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.

  16. A method of predicting changes in human gene splicing induced by genetic variants in context of cis-acting elements

    Directory of Open Access Journals (Sweden)

    Hicks Chindo

    2010-01-01

    Full Text Available Abstract Background Polymorphic variants and mutations disrupting canonical splicing isoforms are among the leading causes of human hereditary disorders. While there is a substantial evidence of aberrant splicing causing Mendelian diseases, the implication of such events in multi-genic disorders is yet to be well understood. We have developed a new tool (SpliceScan II for predicting the effects of genetic variants on splicing and cis-regulatory elements. The novel Bayesian non-canonical 5'GC splice site (SS sensor used in our tool allows inference on non-canonical exons. Results Our tool performed favorably when compared with the existing methods in the context of genes linked to the Autism Spectrum Disorder (ASD. SpliceScan II was able to predict more aberrant splicing isoforms triggered by the mutations, as documented in DBASS5 and DBASS3 aberrant splicing databases, than other existing methods. Detrimental effects behind some of the polymorphic variations previously associated with Alzheimer's and breast cancer could be explained by changes in predicted splicing patterns. Conclusions We have developed SpliceScan II, an effective and sensitive tool for predicting the detrimental effects of genomic variants on splicing leading to Mendelian and complex hereditary disorders. The method could potentially be used to screen resequenced patient DNA to identify de novo mutations and polymorphic variants that could contribute to a genetic disorder.

  17. Prediction of pharmacological and xenobiotic responses to drugs based on time course gene expression profiles.

    Directory of Open Access Journals (Sweden)

    Tao Huang

    Full Text Available More and more people are concerned by the risk of unexpected side effects observed in the later steps of the development of new drugs, either in late clinical development or after marketing approval. In order to reduce the risk of the side effects, it is important to look out for the possible xenobiotic responses at an early stage. We attempt such an effort through a prediction by assuming that similarities in microarray profiles indicate shared mechanisms of action and/or toxicological responses among the chemicals being compared. A large time course microarray database derived from livers of compound-treated rats with thirty-four distinct pharmacological and toxicological responses were studied. The mRMR (Minimum-Redundancy-Maximum-Relevance method and IFS (Incremental Feature Selection were used to select a compact feature set (141 features for the reduction of feature dimension and improvement of prediction performance. With these 141 features, the Leave-one-out cross-validation prediction accuracy of first order response using NNA (Nearest Neighbor Algorithm was 63.9%. Our method can be used for pharmacological and xenobiotic responses prediction of new compounds and accelerate drug development.

  18. Predicting incomplete gene microarray data with the use of supervised learning algorithms

    CSIR Research Space (South Africa)

    Twala, B

    2010-10-01

    Full Text Available of many well-established supervised learning (SL) algorithms in an attempt to provide more accurate and automatic diagnosis class (cancer/non cancer) prediction. Virtually all research on SL addresses the task of learning to classify complete domain...

  19. ABC gene-ranking for prediction of drug-induced cholestasis in rats

    Directory of Open Access Journals (Sweden)

    Yauheniya Cherkas

    2016-01-01

    Full Text Available As legacy toxicogenomics databases have become available, improved data mining approaches are now key to extracting and visualizing subtle relationships between toxicants and gene expression. In the present study, a novel “aggregating bundles of clusters” (ABC procedure was applied to separate cholestatic from non-cholestatic drugs and model toxicants in the Johnson & Johnson (Janssen rat liver toxicogenomics database [3]. Drug-induced cholestasis is an important issue, particularly when a new compound enters the market with this liability, with standard preclinical models often mispredicting this toxicity. Three well-characterized cholestasis-responsive genes (Cyp7a1, Mrp3 and Bsep were chosen from a previous in-house Janssen gene expression signature; these three genes show differing, non-redundant responses across the 90+ paradigm compounds in our database. Using the ABC procedure, extraneous contributions were minimized in comparisons of compound gene responses. All genes were assigned weights proportional to their correlations with Cyp7a1, Mrp3 and Bsep, and a resampling technique was used to derive a stable measure of compound similarity. The compounds that were known to be associated with rat cholestasis generally had small values of this measure relative to each other but also had large values of this measure relative to non-cholestatic compounds. Visualization of the data with the ABC-derived signature showed a very tight, essentially identically behaving cluster of robust human cholestatic drugs and experimental cholestatic toxicants (ethinyl estradiol, LPS, ANIT and methylene dianiline, disulfiram, naltrexone, methapyrilene, phenacetin, alpha-methyl dopa, flutamide, the NSAIDs–—indomethacin, flurbiprofen, diclofenac, flufenamic acid, sulindac, and nimesulide, butylated hydroxytoluene, piperonyl butoxide, and bromobenzene, some slightly less active compounds (3′-acetamidofluorene, amsacrine, hydralazine, tannic acid, some

  20. Polymorphism of CYP46A1 and PPARγ2 genes in risk prediction of primary open angle glaucoma among North Indian population

    Directory of Open Access Journals (Sweden)

    Anu Chandra

    2016-01-01

    Conclusion: Findings of this study suggest that CYP46A1 gene and PPARγ2 gene polymorphisms can be a predictive marker for early identification of population at risk of POAG, although a larger sample size is required to determine the role of these polymorphisms in the pathogenesis and course of POAG.

  1. Identification and Validation of a New Set of Five Genes for Prediction of Risk in Early Breast Cancer

    Directory of Open Access Journals (Sweden)

    Giorgio Mustacchi

    2013-05-01

    Full Text Available Molecular tests predicting the outcome of breast cancer patients based on gene expression levels can be used to assist in making treatment decisions after consideration of conventional markers. In this study we identified a subset of 20 mRNA differentially regulated in breast cancer analyzing several publicly available array gene expression data using R/Bioconductor package. Using RTqPCR we evaluate 261 consecutive invasive breast cancer cases not selected for age, adjuvant treatment, nodal and estrogen receptor status from paraffin embedded sections. The biological samples dataset was split into a training (137 cases and a validation set (124 cases. The gene signature was developed on the training set and a multivariate stepwise Cox analysis selected five genes independently associated with DFS: FGF18 (HR = 1.13, p = 0.05, BCL2 (HR = 0.57, p = 0.001, PRC1 (HR = 1.51, p = 0.001, MMP9 (HR = 1.11, p = 0.08, SERF1a (HR = 0.83, p = 0.007. These five genes were combined into a linear score (signature weighted according to the coefficients of the Cox model, as: 0.125FGF18 − 0.560BCL2 + 0.409PRC1 + 0.104MMP9 − 0.188SERF1A (HR = 2.7, 95% CI = 1.9–4.0, p < 0.001. The signature was then evaluated on the validation set assessing the discrimination ability by a Kaplan Meier analysis, using the same cut offs classifying patients at low, intermediate or high risk of disease relapse as defined on the training set (p < 0.001. Our signature, after a further clinical validation, could be proposed as prognostic signature for disease free survival in breast cancer patients where the indication for adjuvant chemotherapy added to endocrine treatment is uncertain.

  2. Survivin gene levels in the peripheral blood of patients with gastric cancer independently predict survival

    Directory of Open Access Journals (Sweden)

    Scalerta Romano

    2009-12-01

    Full Text Available Abstract Background The detection of circulating tumor cells (CTC is considered a promising tool for improving risk stratification in patients with solid tumors. We investigated on whether the expression of CTC related genes adds any prognostic power to the TNM staging system in patients with gastric carcinoma. Methods Seventy patients with TNM stage I to IV gastric carcinoma were retrospectively enrolled. Peripheral blood samples were tested by means of quantitative real time PCR (qrtPCR for the expression of four CTC related genes: carcinoembryonic antigen (CEA, cytokeratin-19 (CK19, vascular endothelial growth factor (VEGF and Survivin (BIRC5. Results Gene expression of Survivin, CK19, CEA and VEGF was higher than in normal controls in 98.6%, 97.1%, 42.9% and 38.6% of cases, respectively, suggesting a potential diagnostic value of both Survivin and CK19. At multivariable survival analysis, TNM staging and Survivin mRNA levels were retained as independent prognostic factors, demonstrating that Survivin expression in the peripheral blood adds prognostic information to the TNM system. In contrast with previously published data, the transcript abundance of CEA, CK19 and VEGF was not associated with patients' clinical outcome. Conclusions Gene expression levels of Survivin add significant prognostic value to the current TNM staging system. The validation of these findings in larger prospective and multicentric series might lead to the implementation of this biomarker in the routine clinical setting in order to optimize risk stratification and ultimately personalize the therapeutic management of these patients.

  3. Gene expression signature of normal cell-of-origin predicts ovarian tumor outcomes.

    Directory of Open Access Journals (Sweden)

    Melissa A Merritt

    Full Text Available The potential role of the cell-of-origin in determining the tumor phenotype has been raised, but not adequately examined. We hypothesized that distinct cells-of-origin may play a role in determining ovarian tumor phenotype and outcome. Here we describe a new cell culture medium for in vitro culture of paired normal human ovarian (OV and fallopian tube (FT epithelial cells from donors without cancer. While these cells have been cultured individually for short periods of time, to our knowledge this is the first long-term culture of both cell types from the same donors. Through analysis of the gene expression profiles of the cultured OV/FT cells we identified a normal cell-of-origin gene signature that classified primary ovarian cancers into OV-like and FT-like subgroups; this classification correlated with significant differences in clinical outcomes. The identification of a prognostically significant gene expression signature derived solely from normal untransformed cells is consistent with the hypothesis that the normal cell-of-origin may be a source of ovarian tumor heterogeneity and the associated differences in tumor outcome.

  4. Migration phenology and breeding success are predicted by methylation of a photoperiodic gene in the barn swallow

    Science.gov (United States)

    Saino, Nicola; Ambrosini, Roberto; Albetti, Benedetta; Caprioli, Manuela; De Giorgio, Barbara; Gatti, Emanuele; Liechti, Felix; Parolini, Marco; Romano, Andrea; Romano, Maria; Scandolara, Chiara; Gianfranceschi, Luca; Bollati, Valentina; Rubolini, Diego

    2017-01-01

    Individuals often considerably differ in the timing of their life-cycle events, with major consequences for individual fitness, and, ultimately, for population dynamics. Phenological variation can arise from genetic effects but also from epigenetic modifications in DNA expression and translation. Here, we tested if CpG methylation at the poly-Q and 5′-UTR loci of the photoperiodic Clock gene predicted migration and breeding phenology of long-distance migratory barn swallows (Hirundo rustica) that were tracked year-round using light-level geolocators. Increasing methylation at Clock poly-Q was associated with earlier spring departure from the African wintering area, arrival date at the European breeding site, and breeding date. Higher methylation levels also predicted increased breeding success. Thus, we showed for the first time in any species that CpG methylation at a candidate gene may affect phenology and breeding performance. Methylation at Clock may be a candidate mechanism mediating phenological responses of migratory birds to ongoing climate change. PMID:28361883

  5. Migration phenology and breeding success are predicted by methylation of a photoperiodic gene in the barn swallow.

    Science.gov (United States)

    Saino, Nicola; Ambrosini, Roberto; Albetti, Benedetta; Caprioli, Manuela; De Giorgio, Barbara; Gatti, Emanuele; Liechti, Felix; Parolini, Marco; Romano, Andrea; Romano, Maria; Scandolara, Chiara; Gianfranceschi, Luca; Bollati, Valentina; Rubolini, Diego

    2017-03-31

    Individuals often considerably differ in the timing of their life-cycle events, with major consequences for individual fitness, and, ultimately, for population dynamics. Phenological variation can arise from genetic effects but also from epigenetic modifications in DNA expression and translation. Here, we tested if CpG methylation at the poly-Q and 5'-UTR loci of the photoperiodic Clock gene predicted migration and breeding phenology of long-distance migratory barn swallows (Hirundo rustica) that were tracked year-round using light-level geolocators. Increasing methylation at Clock poly-Q was associated with earlier spring departure from the African wintering area, arrival date at the European breeding site, and breeding date. Higher methylation levels also predicted increased breeding success. Thus, we showed for the first time in any species that CpG methylation at a candidate gene may affect phenology and breeding performance. Methylation at Clock may be a candidate mechanism mediating phenological responses of migratory birds to ongoing climate change.

  6. A machine learning approach for identifying amino acid signatures in the HIV env gene predictive of dementia.

    Science.gov (United States)

    Holman, Alexander G; Gabuzda, Dana

    2012-01-01

    The identification of nucleotide sequence variations in viral pathogens linked to disease and clinical outcomes is important for developing vaccines and therapies. However, identifying these genetic variations in rapidly evolving pathogens adapting to selection pressures unique to each host presents several challenges. Machine learning tools provide new opportunities to address these challenges. In HIV infection, virus replicating within the brain causes HIV-associated dementia (HAD) and milder forms of neurocognitive impairment in 20-30% of patients with unsuppressed viremia. HIV neurotropism is primarily determined by the viral envelope (env) gene. To identify amino acid signatures in the HIV env gene predictive of HAD, we developed a machine learning pipeline using the PART rule-learning algorithm and C4.5 decision tree inducer to train a classifier on a meta-dataset (n = 860 env sequences from 78 patients: 40 HAD, 38 non-HAD). To increase the flexibility and biological relevance of our analysis, we included 4 numeric factors describing amino acid hydrophobicity, polarity, bulkiness, and charge, in addition to amino acid identities. The classifier had 75% predictive accuracy in leave-one-out cross-validation, and identified 5 signatures associated with HAD diagnosis (pmachine learning tools to analyze the genetics of rapidly evolving pathogens.

  7. Study on Red Coat Color Gene and Prediction of the Secondary Structure in Chinese Holstein

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    The nucleotide sequence of the melanocortin-1-receptor (MC1R) gone was studied with the help of the polymerase chain reaction (PCR), in which the protein structure in Chinese Holstein was predicted, and the molecular mechanism of the red coat color was investigated. Polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) was performed to genotype the individuals. The bioinformatics and biotechnology softwares were used to predict the secondary structure of MC1R. The results showed that the EE genotype was the dominant genotype in Chinese Holstein Black and White herd, whereas, it was ee in Chinese Holstein Red and White herd. The secondary structure of the mutational MCIR protein was changed and the deletion mutation caused an earlier termination in translation, which led to the formation of the red coat color. The allele E was mainly associated with the black coat color, whereas, e was associated with red.

  8. Why does parental language input style predict child language development? A twin study of gene-environment correlation.

    Science.gov (United States)

    Dale, Philip S; Tosto, Maria Grazia; Hayiou-Thomas, Marianna E; Plomin, Robert

    2015-01-01

    There are well-established correlations between parental input style and child language development, which have typically been interpreted as evidence that the input style causes, or influences the rate of, changes in child language. We present evidence from a large twin study (TEDS; 8395 pairs for this report) that there are also likely to be both child-to-parent effects and shared genetic effects on parent and child. Self-reported parental language style at child age 3 and age 4 was aggregated into an 'informal language stimulation' factor and a 'corrective feedback' factor at each age; the former was positively correlated with child language concurrently and longitudinally at 3, 4, and 4.5 years, whereas the latter was weakly and negatively correlated. Both parental input factors were moderately heritable, as was child language. Longitudinal bivariate analysis showed that the correlation between the language stimulation factor and child language was significantly and moderately due to shared genes. There is some suggestive evidence from longitudinal phenotypic analysis that the prediction from parental language stimulation to child language includes both evocative and passive gene-environment correlation, with the latter playing a larger role. The reader will understand why correlations between parental language and rate of child language are by themselves ambiguous, and how twin studies can clarify the relationship. The reader will also understand that, based on the present study, at least two aspects of parental language style - informal language stimulation and corrective feedback - have substantial genetic influence, and that for informal language stimulation, a substantial portion of the prediction to child language represents the effect of shared genes on both parent and child. It will also be appreciated that these basic research findings do not imply that parental language input style is unimportant or that interventions cannot be effective. Copyright

  9. Comparative study of joint analysis of microarray gene expression data in survival prediction and risk assessment of breast cancer patients.

    Science.gov (United States)

    Yasrebi, Haleh

    2016-09-01

    Microarray gene expression data sets are jointly analyzed to increase statistical power. They could either be merged together or analyzed by meta-analysis. For a given ensemble of data sets, it cannot be foreseen which of these paradigms, merging or meta-analysis, works better. In this article, three joint analysis methods, Z-score normalization, ComBat and the inverse normal method (meta-analysis) were selected for survival prognosis and risk assessment of breast cancer patients. The methods were applied to eight microarray gene expression data sets, totaling 1324 patients with two clinical endpoints, overall survival and relapse-free survival. The performance derived from the joint analysis methods was evaluated using Cox regression for survival analysis and independent validation used as bias estimation. Overall, Z-score normalization had a better performance than ComBat and meta-analysis. Higher Area Under the Receiver Operating Characteristic curve and hazard ratio were also obtained when independent validation was used as bias estimation. With a lower time and memory complexity, Z-score normalization is a simple method for joint analysis of microarray gene expression data sets. The derived findings suggest further assessment of this method in future survival prediction and cancer classification applications.

  10. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  11. HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.

    Directory of Open Access Journals (Sweden)

    Shibiao Wan

    Full Text Available Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.

  12. HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins

    Science.gov (United States)

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2014-01-01

    Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/. PMID:24647341

  13. ETS Gene Fusions as Predictive Biomarkers of Resistance to Radiation Therapy for Prostate Cancer

    Science.gov (United States)

    2011-08-01

    facilitates DNA end processing and rejoin- ing in a multistep procedure that requires the XRCC4/ DNA Ligase IV complex. In fact, XRCC4 and DNA Ligase IV are...Targeted disruption of the gene encoding DNA ligase IV leads to lethality in embryonic mice. Curr. Biol. 8, 1395–1398. Bennett, E.J., and Harper, J.W. (2008...1998). Late embryonic lethality and impaired V(D)J recombination in mice lacking DNA ligase IV. Nature 396, 173–177. Gallagher, D.J., Gaudet, M.M

  14. Improving the Prediction of Survival in Cancer Patients by Using Machine Learning Techniques: Experience of Gene Expression Data: A Narrative Review.

    Science.gov (United States)

    Bashiri, Azadeh; Ghazisaeedi, Marjan; Safdari, Reza; Shahmoradi, Leila; Ehtesham, Hamide

    2017-02-01

    Today, despite the many advances in early detection of diseases, cancer patients have a poor prognosis and the survival rates in them are low. Recently, microarray technologies have been used for gathering thousands data about the gene expression level of cancer cells. These types of data are the main indicators in survival prediction of cancer. This study highlights the improvement of survival prediction based on gene expression data by using machine learning techniques in cancer patients. This review article was conducted by searching articles between 2000 to 2016 in scientific databases and e-Journals. We used keywords such as machine learning, gene expression data, survival and cancer. Studies have shown the high accuracy and effectiveness of gene expression data in comparison with clinical data in survival prediction. Because of bewildering and high volume of such data, studies have highlighted the importance of machine learning algorithms such as Artificial Neural Networks (ANN) to find out the distinctive signatures of gene expression in cancer patients. These algorithms improve the efficiency of probing and analyzing gene expression in cancer profiles for survival prediction of cancer. By attention to the capabilities of machine learning techniques in proteomics and genomics applications, developing clinical decision support systems based on these methods for analyzing gene expression data can prevent potential errors in survival estimation, provide appropriate and individualized treatments to patients and improve the prognosis of cancers.

  15. Prediction of heterogeneous differential genes by detecting outliers to a Gaussian tight cluster.

    Science.gov (United States)

    Yang, Zihua; Yang, Zhengrong

    2013-03-05

    Heterogeneously and differentially expressed genes (hDEG) are a common phenomenon due to bio-logical diversity. A hDEG is often observed in gene expression experiments (with two experimental conditions) where it is highly expressed in a few experimental samples, or in drug trial experiments for cancer studies with drug resistance heterogeneity among the disease group. These highly expressed samples are called outliers. Accurate detection of outliers among hDEGs is then desirable for dis- ease diagnosis and effective drug design. The standard approach for detecting hDEGs is to choose the appropriate subset of outliers to represent the experimental group. However, existing methods typically overlook hDEGs with very few outliers. We present in this paper a simple algorithm for detecting hDEGs by sequentially testing for potential outliers with respect to a tight cluster of non- outliers, among an ordered subset of the experimental samples. This avoids making any restrictive assumptions about how the outliers are distributed. We use simulated and real data to illustrate that the proposed algorithm achieves a good separation between the tight cluster of low expressions and the outliers for hDEGs. The proposed algorithm assesses each potential outlier in relation to the cluster of potential outliers without making explicit assumptions about the outlier distribution. Simulated examples and and breast cancer data sets are used to illustrate the suitability of the proposed algorithm for identifying hDEGs with small numbers of outliers.

  16. Prediction of Toxin Genes from Chinese Yellow Catfish Based on Transcriptomic and Proteomic Sequencing

    Directory of Open Access Journals (Sweden)

    Bing Xie

    2016-04-01

    Full Text Available Fish venom remains a virtually untapped resource. There are so few fish toxin sequences for reference, which increases the difficulty to study toxins from venomous fish and to develop efficient and fast methods to dig out toxin genes or proteins. Here, we utilized Chinese yellow catfish (Pelteobagrus fulvidraco as our research object, since it is a representative species in Siluriformes with its venom glands embedded in the pectoral and dorsal fins. In this study, we set up an in-house toxin database and a novel toxin-discovering protocol to dig out precise toxin genes by combination of transcriptomic and proteomic sequencing. Finally, we obtained 15 putative toxin proteins distributed in five groups, namely Veficolin, Ink toxin, Adamalysin, Za2G and CRISP toxin. It seems that we have developed a novel bioinformatics method, through which we could identify toxin proteins with high confidence. Meanwhile, these toxins can also be useful for comparative studies in other fish and development of potential drugs.

  17. Conservation-based prediction of the transcription regulatory region of the SCN1A gene

    Institute of Scientific and Technical Information of China (English)

    Yue-Sheng Long; Yi-Wu Shi; Wei-Ping Liao

    2009-01-01

    A challenge in identifying the transcription regulatory region is that the locations of eukaryotic transcriptional elements are often diverse among different genes.SCN1A,a disease-related sodium channel gene,has a complex 5'-untranslated region and diverse mRNA transcripts,which might be driven by different promoters.By cross-species sequence comparison and bioinformatics analysis,human 5'-untranslated exons were found to be conserved within the region of 200 kb upstream of the 5' flanking regions of SCN1A in higher mammals,but not in lower mammals and non-mammals.The core promoter elements (INR,DPE,and TATA) were found in the regions flanking different 5'-untranslated exons,suggesting that these sequences (from-45 to+35) might be targeted as core promoters.The nucleotide identity rate of these core promoter sequences are different,and the conservation level of the upstream region of each core promoter varies distinctly,implicating different regulatory mechanisms of the four promoters which exist in the nervous system.

  18. Sphingoid Base Metabolism in Yeast: Mapping Gene Expression Patterns Into Qualitative Metabolite Time Course Predictions

    OpenAIRE

    Tomas Radivoyevitch

    2001-01-01

    Can qualitative metabolite time course predictions be inferred from measured mRNA expression patterns? Speaking against this possibility is the large number of ‘decoupling’ control points that lie between these variables, i.e. translation, protein degradation, enzyme inhibition and enzyme activation. Speaking for it is the notion that these control points might be coordinately regulated such that action exerted on the mRNA level is informative of action exerted on the protein and me...

  19. Conserved synteny at the protein family level reveals genes underlying Shewanella species cold tolerance and predicts their novel phenotypes

    Energy Technology Data Exchange (ETDEWEB)

    Karpinets, Tatiana V.; Obraztsova, Anna; Wang, Yanbing; Schmoyer, Denise D.; Kora, Guruprasad; Park, Byung H.; Serres, Margrethe H.; Romine, Margaret F.; Land, Miriam L.; Kothe, Terence B.; Fredrickson, Jim K.; Nealson, Kenneth H.; Uberbacher, Edward

    2010-03-01

    Bacteria of the genus Shewanella can thrive in different environments and demonstrate significant variability in their metabolic and ecophysiological capabilities including cold and salt tolerance. Genomic characteristics underlying this variability across species are largely unknown. In this study we address the problem by a comparison of the physiological, metabolic and genomic characteristics of 19 sequenced Shewanella species. We have employed two novel approaches based on association of a phenotypic trait with the number of the trait-specific protein families (Pfam domains) and on the conservation of synteny (order in the genome) of the trait-related genes. Our first approach is top-down and involves experimental evaluation and quantification of the species’ cold tolerance followed by identification of the correlated Pfam domains and genes with a conserved synteny. The second, a bottom-up approach, predicts novel phenotypes of the species by calculating profiles of each Pfam domain among their genomes and following pair-wise correlation of the profiles and their network clustering. Using the first approach we find a link between cold and salt tolerance of the species and the presence in the genome of a Na+/H+ antiporter gene cluster. Other cold tolerance related genes includes peptidases, chemotaxis sensory transducer proteins, a cysteine exporter, and helicases. Using the bottom-up approach we found several novel phenotypes in the newly sequenced Shewanella species, including degradation of aromatic compounds by an aerobic hybrid pathway in S. woodyi, degradation of ethanolamine by S. benthica, and propanediol degradation by S. putrefaciens CN32 and S. sp. W3-18-1.

  20. KIR Genes and Their Ligands Predict the Response to Anti-EGFR Monoclonal Antibodies in Solid Tumors.

    Science.gov (United States)

    Morales-Estevez, Cristina; De la Haba-Rodriguez, Juan; Manzanares-Martin, Barbara; Porras-Quintela, Ignacio; Rodriguez-Ariza, Antonio; Moreno-Vega, Alberto; Ortiz-Morales, Maria J; Gomez-España, Maria A; Cano-Osuna, Maria T; Lopez-Gonzalez, Javier; Chia-Delgado, Beatriz; Gonzalez-Fernandez, Rafael; Aranda-Aguilar, Enrique

    2016-01-01

    Killer-cell immunoglobulin-like receptors (KIRs) regulate the killing function of natural killer cells, which play an important role in the antibody-dependent cell-mediated cytotoxicity response exerted by therapeutic monoclonal antibodies (mAbs). However, it is unknown whether the extensive genetic variability of KIR genes and/or their human leukocyte antigen (HLA) ligands might influence the response to these treatments. This study aimed to explore whether the variability in KIR/HLA genes may be associated with the variable response observed to mAbs based anti-epidermal growth factor receptor (EGFR) therapies. Thirty-nine patients treated with anti-EGFR mAbs (trastuzumab for advanced breast cancer, or cetuximab for advanced colorectal or advanced head and neck cancer) were included in the study. All the patients had progressed to mAbs therapy and were grouped into two categories taking into account time to treatment failure (TTF ≤6 and ≥10 months). KIR genotyping (16 genetic variability) was performed in genomic DNA from peripheral blood by PCR sequence-specific primer technique, and HLA ligand typing was performed for HLA-B and -C loci by reverse polymerase chain reaction sequence-specific oligonucleotide methodology. Subjects carrying the KIR/HLA ligand combinations KIR2DS1/HLAC2C2-C1C2 and KIR3DS1/HLABw4w4-w4w6 showed longer TTF than non-carriers counterparts (14.76 vs. 3.73 months, p KIR/HLA ligand combinations predict better response of patients to anti-EGFR therapy. These findings increase the overall knowledge on the role of specific gene variants related to responsiveness to anti-EGFR treatment in solid tumors and highlight the importance of assessing gene polymorphisms related to cancer medications.

  1. KIR Genes and their Ligands Predict the Response to Anti-2 EGFR Monoclonal Antibodies in Solid Tumors

    Directory of Open Access Journals (Sweden)

    Cristina Morales-Estevez

    2016-12-01

    Full Text Available Killer-cell immunoglobulin-like receptors (KIRs regulate the killing function of NK cells, which play an important role in the antibody-dependent cell-mediated cytotoxicity (ADCC response exerted by therapeutic monoclonal antibodies (mAbs. However, it is unknown whether the extensive genetic variability of KIR genes and/or their HLA ligands might influence the response to these treatments. This study aimed to explore whether the variability in KIR/HLA genes may be associated to the variable response observed to mAbs-based anti-EGFR therapies. Thirty-nine patients treated with anti-EGFR mAbs (trastuzumab for advanced breast cancer, or cetuximab for advanced colorectal or advanced head and neck cancer, were included in the study. All the patients had progressed to mAbs therapy and were grouped into two categories taking into account time to treatment failure (TTF ≤6 months and TTF ≥10 months. KIR genotyping (16 genetic variability was performed in genomic DNA from peripheral blood by PCR sequence-specific primer technique and HLA ligand typing was performed for HLA-B & -C loci by reverse PCR-SSO methodology. Subjects carrying the KIR/HLA ligand combinations KIR2DS1/HLAC2C2-C1C2 and KIR3DS1/HLABw4w4-w4w6 showed longer TTF than non-carriers counterparts (14,76 m vs 3,73 m, p<0.001, and 14,93 m vs 4,6 m, p=0.005 respectively. No other significant differences were observed. Two activating KIR/HLA ligand combinations predict better response of patients to anti-EGFR therapy. These findings increase the overall knowledge on the role of specific gene variants related with responsivenessto anti-EGFR treatment in solid tumours and highlight the importance of assessing gene polymorphisms related with cancer medications.

  2. PPARα gene variants as predicted performance-enhancing polymorphisms in professional Italian soccer players

    Directory of Open Access Journals (Sweden)

    Proia P

    2014-12-01

    Full Text Available Patrizia Proia,1 Antonino Bianco,1 Gabriella Schiera,2 Patrizia Saladino,2 Valentina Contrò,1 Giovanni Caramazza,3 Marcello Traina,1 Keith A Grimaldi,4 Antonio Palma,1 Antonio Paoli5 1Sport and Exercise Sciences Research Unit, 2Department of Biological, Chemical and Pharmaceutical Sciences and Technologies, University of Palermo, Palermo, Italy; 3Regional Sports School of CONI Sicilia, Sicily, Italy; 4Biomedical Engineering Laboratory, Institute of Communication and Computer Systems, National Technical University of Athens, Athens, Greece; 5Department of Biomedical Sciences, University of Padova, Padua, Italy Background: The PPARα gene encodes the peroxisome proliferator-activator receptor alpha, a central regulator of expression of other genes involved in fatty acid metabolism. The purpose of this study was to determine the prevalence of G allele of the PPARα intron 7 G/C polymorphism (rs4253778 in professional Italian soccer players. Methods: Sixty professional soccer players and 30 sedentary volunteers were enrolled in the study. Samples of venous blood were obtained at rest, in the morning, by conventional clinical procedures; blood serum was collected and total cholesterol, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, and triglycerides were measured. An aliquot of anticoagulant-treated blood was used to prepare genomic DNA from whole blood. The G/C polymorphic site in PPARα intron 7 was scanned by using the PCR-RFLP (polymerase chain reaction restriction fragment length polymorphism protocol with TaqI enzyme. Results: We found variations in genotype distribution of PPARα polymorphism between professional soccer players and sedentary volunteers. Particularly, G alleles and the GG genotype were significantly more frequent in soccer players compared with healthy controls (64% versus 48%. No significant correlations were found between lipid profile and genotype background. Conclusion: Previous results

  3. A combination of dopamine genes predicts success by professional Wall Street traders.

    Directory of Open Access Journals (Sweden)

    Steve Sapra

    Full Text Available What determines success on Wall Street? This study examined if genes affecting dopamine levels of professional traders were associated with their career tenure. Sixty professional Wall Street traders were genotyped and compared to a control group who did not trade stocks. We found that distinct alleles of the dopamine receptor 4 promoter (DRD4P and catecholamine-O-methyltransferase (COMT that affect synaptic dopamine were predominant in traders. These alleles are associated with moderate, rather than very high or very low, levels of synaptic dopamine. The activity of these alleles correlated positively with years spent trading stocks on Wall Street. Differences in personality and trading behavior were also correlated with allelic variants. This evidence suggests there may be a genetic basis for the traits that make one a successful trader.

  4. A combination of dopamine genes predicts success by professional Wall Street traders.

    Science.gov (United States)

    Sapra, Steve; Beavin, Laura E; Zak, Paul J

    2012-01-01

    What determines success on Wall Street? This study examined if genes affecting dopamine levels of professional traders were associated with their career tenure. Sixty professional Wall Street traders were genotyped and compared to a control group who did not trade stocks. We found that distinct alleles of the dopamine receptor 4 promoter (DRD4P) and catecholamine-O-methyltransferase (COMT) that affect synaptic dopamine were predominant in traders. These alleles are associated with moderate, rather than very high or very low, levels of synaptic dopamine. The activity of these alleles correlated positively with years spent trading stocks on Wall Street. Differences in personality and trading behavior were also correlated with allelic variants. This evidence suggests there may be a genetic basis for the traits that make one a successful trader.

  5. Metallothionein gene expression is altered in oral cancer and may predict metastasis and patient outcomes.

    Science.gov (United States)

    Brazão-Silva, Marco T; Rodrigues, Maria Fernandes S; Eisenberg, Ana Lúcia A; Dias, Fernando L; de Castro, Luciana M; Nunes, Fábio D; Faria, Paulo R; Cardoso, Sérgio V; Loyola, Adriano M; de Sousa, Suzana C O M

    2015-09-01

    Metallothioneins (MTs) are proteins associated with the carcinogenesis and prognosis of various tumours. Previous studies have shown their potential as biomarkers in oral squamous cell carcinoma (OSCC). Aiming to understand more clearly the function of MTs in OSCC we evaluated, for the first time, the gene expression profile of MTs in this neoplasm. Tissue samples from 35 cases of tongue and/or floor of mouth OSCC, paired with their corresponding non-neoplastic oral mucosa (NNOM), were retrieved (2007-09). All tissues were analysed for the following genes using TaqMan(®) reverse transcription-quantitative polymerase chain reaction (RT-qPCR) assays: MT1A, MT1B, MT1E, MT1F, MT1G, MT1H, MT1X, MT2A, MT3 and MT4. The expression of MT1B and MT1H was seldom detected in both OSCC and NNOM. A significant loss of MT1A, MT1X, MT3 and MT4 expression and gain of MT1F expression was observed in OSCC, compared to NNOM. Cases with MT1G down-regulation exhibited the worst prognoses. The up-regulation of MT1X was restricted to non-metastatic cases, whereas up-regulation of MT3 was related to cases with lymph node metastasis. Metallothionein mRNA expression is altered significantly in oral squamous cell carcinomas. The expression of MT1G, MT1X and MT3 may aid in the prognostic discrimination of OSCC cases. © 2015 John Wiley & Sons Ltd.

  6. Allele summation of diabetes risk genes predicts impaired glucose tolerance in female and obese individuals.

    Directory of Open Access Journals (Sweden)

    Katarzyna Linder

    Full Text Available INTRODUCTION: Single nucleotide polymorphisms (SNPs in approximately 40 genes have been associated with an increased risk for type 2 diabetes (T2D in genome-wide association studies. It is not known whether a similar genetic impact on the risk of prediabetes (impaired glucose tolerance [IGT] or impaired fasting glycemia [IFG] exists. METHODS: In our cohort of 1442 non-diabetic subjects of European origin (normal glucose tolerance [NGT] n = 1046, isolated IFG n = 142, isolated IGT n = 140, IFG+IGT n = 114, an impact on glucose homeostasis has been shown for 9 SNPs in previous studies in this specific cohort. We analyzed these SNPs (within or in the vicinity of the genes TCF7L2, KCNJ11, HHEX, SLC30A8, WFS1, KCNQ1, MTNR1B, FTO, PPARG for association with prediabetes. RESULTS: The genetic risk load was significantly associated with the risk for IGT (p = 0.0006 in a model including gender, age, BMI and insulin sensitivity. To further evaluate potential confounding effects, we stratified the population on gender, BMI and insulin sensitivity. The association of the risk score with IGT was present in female participants (p = 0.008, but not in male participants. The risk score was significantly associated with IGT (p = 0.008 in subjects with a body mass index higher than 30 kg/m(2 but not in non-obese individuals. Furthermore, only in insulin resistant subjects a significant association between the genetic load and the risk for IGT (p = 0.01 was found. DISCUSSION: We found that T2D genetic risk alleles cause an increased risk for IGT. This effect was not present in male, lean and insulin sensitive subjects, suggesting a protective role of beneficial environmental factors on the genetic risk.

  7. k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction.

    Science.gov (United States)

    Parry, R M; Jones, W; Stokes, T H; Phan, J H; Moffitt, R A; Fang, H; Shi, L; Oberthuer, A; Fischer, M; Tong, W; Wang, M D

    2010-08-01

    In the clinical application of genomic data analysis and modeling, a number of factors contribute to the performance of disease classification and clinical outcome prediction. This study focuses on the k-nearest neighbor (KNN) modeling strategy and its clinical use. Although KNN is simple and clinically appealing, large performance variations were found among experienced data analysis teams in the MicroArray Quality Control Phase II (MAQC-II) project. For clinical end points and controls from breast cancer, neuroblastoma and multiple myeloma, we systematically generated 463,320 KNN models by varying feature ranking method, number of features, distance metric, number of neighbors, vote weighting and decision threshold. We identified factors that contribute to the MAQC-II project performance variation, and validated a KNN data analysis protocol using a newly generated clinical data set with 478 neuroblastoma patients. We interpreted the biological and practical significance of the derived KNN models, and compared their performance with existing clinical factors.

  8. Radiation-induced gene expression in human subcutaneous fibroblasts is predictive of radiation-induced fibrosis

    DEFF Research Database (Denmark)

    Rødningen, Olaug Kristin; Børresen-Dale, Anne-Lise; Alsner, Jan

    2008-01-01

    with variable risk of RIF (grouped into five classes from low to high risk) were irradiated with two different schemes: 1x3.5Gy with RNA isolated 2 and 24h after irradiation, and a fractionated scheme with 3x3.5Gy in intervals of 24h with RNA isolated 2h after the last dose. RNA was also isolated from non......BACKGROUND AND PURPOSE: Breast cancer patients show a large variation in normal tissue reactions after ionizing radiation (IR) therapy. One of the most common long-term adverse effects of ionizing radiotherapy is radiation-induced fibrosis (RIF), and several attempts have been made over the last...... years to develop predictive assays for RIF. Our aim was to identify basal and radiation-induced transcriptional profiles in fibroblasts from breast cancer patients that might be related to the individual risk of RIF in these patients. MATERIALS AND METHODS: Fibroblast cell lines from 31 individuals...

  9. Gene expression signatures that predict outcome of tamoxifen-treated estrogen receptor-positive, high-risk, primary breast cancer patients: a DBCG study.

    Directory of Open Access Journals (Sweden)

    Maria B Lyng

    Full Text Available BACKGROUND: Tamoxifen significantly improves outcome for estrogen receptor-positive (ER+ breast cancer, but the 15-year recurrence rate remains 30%. The aim of this study was to identify gene profiles that accurately predicted the outcome of ER+ breast cancer patients who received adjuvant Tamoxifen mono-therapy. METHODOLOGY/PRINCIPAL FINDINGS: Post-menopausal breast cancer patients diagnosed no later than 2002, being ER+ as defined by >1% IHC staining and having a frozen tumor sample with >50% tumor content were included. Tumor samples from 108 patients treated with adjuvant Tamoxifen were analyzed for the expression of 59 genes using quantitative-PCR. End-point was clinically verified recurrence to distant organs or ipsilateral breast. Gene profiles were identified using a model building procedure based on conditional logistic regression and leave-one-out cross-validation, followed by a non-parametric bootstrap (1000x re-sampling. The optimal profiles were further examined in 5 previously-reported datasets containing similar patient populations that were either treated with Tamoxifen or left untreated (n = 623. Three gene signatures were identified, the strongest being a 2-gene combination of BCL2-CDKN1A, exhibiting an accuracy of 75% for prediction of outcome. Independent examination using 4 previously-reported microarray datasets of Tamoxifen-treated patient samples (n = 503 confirmed the potential of BCL2-CDKN1A. The predictive value was further determined by comparing the ability of the genes to predict recurrence in an additional, previously-published, cohort consisting of Tamoxifen-treated (n = 58, p = 0.015 and untreated patients (n = 62, p = 0.25. CONCLUSIONS/SIGNIFICANCE: A novel gene expression signature predictive of outcome of Tamoxifen-treated patients was identified. The validation suggests that BCL2-CDKN1A exhibit promising predictive potential.

  10. Putative extremely high rate of proteome innovation in lancelets might be explained by high rate of gene prediction errors.

    Science.gov (United States)

    Bányai, László; Patthy, László

    2016-08-01

    A recent analysis of the genomes of Chinese and Florida lancelets has concluded that the rate of creation of novel protein domain combinations is orders of magnitude greater in lancelets than in other metazoa and it was suggested that continuous activity of transposable elements in lancelets is responsible for this increased rate of protein innovation. Since morphologically Chinese and Florida lancelets are highly conserved, this finding would contradict the observation that high rates of protein innovation are usually associated with major evolutionary innovations. Here we show that the conclusion that the rate of proteome innovation is exceptionally high in lancelets may be unjustified: the differences observed in domain architectures of orthologous proteins of different amphioxus species probably reflect high rates of gene prediction errors rather than true innovation.

  11. Gene Expression Profiles for Predicting Metastasis in Breast Cancer: A Cross-Study Comparison of Classification Methods

    Directory of Open Access Journals (Sweden)

    Mark Burton

    2012-01-01

    Full Text Available Machine learning has increasingly been used with microarray gene expression data and for the development of classifiers using a variety of methods. However, method comparisons in cross-study datasets are very scarce. This study compares the performance of seven classification methods and the effect of voting for predicting metastasis outcome in breast cancer patients, in three situations: within the same dataset or across datasets on similar or dissimilar microarray platforms. Combining classification results from seven classifiers into one voting decision performed significantly better during internal validation as well as external validation in similar microarray platforms than the underlying classification methods. When validating between different microarray platforms, random forest, another voting-based method, proved to be the best performing method. We conclude that voting based classifiers provided an advantage with respect to classifying metastasis outcome in breast cancer patients.

  12. A machine learning approach for identifying amino acid signatures in the HIV env gene predictive of dementia.

    Directory of Open Access Journals (Sweden)

    Alexander G Holman

    Full Text Available The identification of nucleotide sequence variations in viral pathogens linked to disease and clinical outcomes is important for developing vaccines and therapies. However, identifying these genetic variations in rapidly evolving pathogens adapting to selection pressures unique to each host presents several challenges. Machine learning tools provide new opportunities to address these challenges. In HIV infection, virus replicating within the brain causes HIV-associated dementia (HAD and milder forms of neurocognitive impairment in 20-30% of patients with unsuppressed viremia. HIV neurotropism is primarily determined by the viral envelope (env gene. To identify amino acid signatures in the HIV env gene predictive of HAD, we developed a machine learning pipeline using the PART rule-learning algorithm and C4.5 decision tree inducer to train a classifier on a meta-dataset (n = 860 env sequences from 78 patients: 40 HAD, 38 non-HAD. To increase the flexibility and biological relevance of our analysis, we included 4 numeric factors describing amino acid hydrophobicity, polarity, bulkiness, and charge, in addition to amino acid identities. The classifier had 75% predictive accuracy in leave-one-out cross-validation, and identified 5 signatures associated with HAD diagnosis (p<0.05, Fisher's exact test. These HAD signatures were found in the majority of brain sequences from 8 of 10 HAD patients from an independent cohort. Additionally, 2 HAD signatures were validated against env sequences from CSF of a second independent cohort. This analysis provides insight into viral genetic determinants associated with HAD, and develops novel methods for applying machine learning tools to analyze the genetics of rapidly evolving pathogens.

  13. Severity of phenotype in cystinosis varies with mutations in the CTNS gene: predicted effect on the model of cystinosin.

    Science.gov (United States)

    Attard, M; Jean, G; Forestier, L; Cherqui, S; van't Hoff, W; Broyer, M; Antignac, C; Town, M

    1999-12-01

    Infantile nephropathic cystinosis is a rare, autosomal recessive disease caused by a defect in the transport of cystine across the lysosomal membrane and characterized by early onset of renal proximal tubular dysfunction. Late-onset cystinosis, a rarer form of the disorder, is characterized by onset of symptoms between 12 and 15 years of age. We previously characterized the cystinosis gene, CTNS, and identified pathogenic mutations in patients with infantile nephropathic cystinosis, including a common, approximately 65 kb deletion which encompasses exons 1-10. Structure predictions suggested that the gene product, cystinosin, is a novel integral lysosomal membrane protein. We now examine the predicted effect of mutations on this model of cystinosin. In this study, we screened patients with infantile nephropathic cystinosis, those with late-onset cystinosis and patients whose phenotype does not fit the classical definitions. We found 23 different mutations in CTNS; 14 are novel mutations. Out of 25 patients with infantile nephropathic cystinosis, 12 have two severely truncating mutations, which is consistent with a loss of functional protein, and 13 have missense or in-frame deletions, which would result in disruption of transmembrane domains and loss of protein function. Mutations found in two late-onset patients affect functionally unimportant regions of cystinosin, which accounts for their milder phenotype. For three patients, the age of onset of cystinosis was <7 years but the course of the disease was milder than the infantile nephropathic form. This suggests that the missense mutations found in these individuals allow production of functional protein and may also indicate regions of cystinosin which are not functionally important.

  14. A Common Variant in the SETD7 Gene Predicts Serum Lycopene Concentrations.

    Science.gov (United States)

    D'Adamo, Christopher R; D'Urso, Antonietta; Ryan, Kathleen A; Yerges-Armstrong, Laura M; Semba, Richard D; Steinle, Nanette I; Mitchell, Braxton D; Shuldiner, Alan R; McArdle, Patrick F

    2016-02-06

    Dietary intake and higher serum concentrations of lycopene have been associated with lower incidence of prostate cancer and other chronic diseases. Identifying determinants of serum lycopene concentrations may thus have important public health implications. Prior studies have suggested that serum lycopene concentrations are under partial genetic control. The goal of this research was to identify genetic predictors of serum lycopene concentrations using the genome-wide association study (GWAS) approach among a sample of 441 Old Order Amish adults that consumed a controlled diet. Linear regression models were utilized to evaluate associations between genetic variants and serum concentrations of lycopene. Variant rs7680948 on chromosome 4, located in the intron region of the SETD7 gene, was significantly associated with serum lycopene concentrations (p = 3.41 × 10(-9)). Our findings also provided nominal support for the association previously noted between SCARB1 and serum lycopene concentrations, although with a different SNP (rs11057841) in the region. This study identified a novel locus associated with serum lycopene concentrations and our results raise a number of intriguing possibilities regarding the nature of the relationship between SETD7 and lycopene, both of which have been independently associated with prostate cancer. Further investigation into this relationship might help provide greater mechanistic understanding of these associations.

  15. A synthetic library of RNA control modules for predictable tuning of gene expression in yeast.

    Science.gov (United States)

    Babiskin, Andrew H; Smolke, Christina D

    2011-03-01

    Advances in synthetic biology have resulted in the development of genetic tools that support the design of complex biological systems encoding desired functions. The majority of efforts have focused on the development of regulatory tools in bacteria, whereas fewer tools exist for the tuning of expression levels in eukaryotic organisms. Here, we describe a novel class of RNA-based control modules that provide predictable tuning of expression levels in the yeast Saccharomyces cerevisiae. A library of synthetic control modules that act through posttranscriptional RNase cleavage mechanisms was generated through an in vivo screen, in which structural engineering methods were applied to enhance the insulation and modularity of the resulting components. This new class of control elements can be combined with any promoter to support titration of regulatory strategies encoded in transcriptional regulators and thus more sophisticated control schemes. We applied these synthetic controllers to the systematic titration of flux through the ergosterol biosynthesis pathway, providing insight into endogenous control strategies and highlighting the utility of this control module library for manipulating and probing biological systems.

  16. Transcription Profiles of Marker Genes Predict The Transdifferentiation Relationship between Eight Types of Liver Cell during Rat Liver Regeneration

    Directory of Open Access Journals (Sweden)

    Xiaguang Chen

    2015-07-01

    Full Text Available Objective: To investigate the transdifferentiation relationship between eight types of liver cell during rat liver regeneration (LR. Materials and Methods: 114 healthy Sprague-Dawley (SD rats were used in this experimental study. Eight types of liver cell were isolated and purified with percoll density gradient centrifugation and immunomagentic bead methods. Marker genes for eight types of cell were obtained by retrieving the relevant references and databases. Expression changes of markers for each cell of the eight cell types were measured using microarray. The relationships between the expression profiles of marker genes and transdifferentiation among liver cells were analyzed using bioinformatics. Liver cell transdifferentiation was predicted by comparing expression profiles of marker genes in different liver cells. Results: During LR hepatocytes (HCs not only express hepatic oval cells (HOC markers (including PROM1, KRT14 and LY6E, but also express biliary epithelial cell (BEC markers (including KRT7 and KRT19; BECs express both HOC markers (including GABRP, PCNA and THY1 and HC markers such as CPS1, TAT, KRT8 and KRT18; both HC markers (KRT18, KRT8 and WT1 and BEC markers (KRT7 and KRT19 were detected in HOCs. Additionally, some HC markers were also significantly upregulated in hepatic stellate cells ( HSCs, sinusoidal endothelial cells (SECs , Kupffer cells (KCs and dendritic cells (DCs, mainly at 6-72 hours post partial hepatectomy (PH. Conclusion: Our findings indicate that there is a mutual transdifferentiation relationship between HC, BEC and HOC during LR, and a tendency for HSCs, SECs, KCs and DCs to transdifferentiate into HCs.

  17. A Computational Protein Phenotype Prediction Approach to Analyze the Deleterious Mutations of Human MED12 Gene.

    Science.gov (United States)

    Banaganapalli, Babajan; Mohammed, Kaleemuddin; Khan, Imran Ali; Al-Aama, Jumana Y; Elango, Ramu; Shaik, Noor Ahmad

    2016-09-01

    Genetic mutations in MED12, a subunit of Mediator complex are seen in a broad spectrum of human diseases. However, the underlying basis of how these pathogenic mutations elicit protein phenotype changes in terms of 3D structure, stability and protein binding sites remains unknown. Therefore, we aimed to investigate the structural and functional impacts of MED12 mutations, using computational methods as an alternate to traditional in vivo and in vitro approaches. The MED12 gene mutations details and their corresponding clinical associations were collected from different databases and by text-mining. Initially, diverse computational approaches were applied to categorize the different classes of mutations based on their deleterious impact to MED12. Then, protein structures for wild and mutant types built by integrative modeling were analyzed for structural divergence, solvent accessibility, stability, and functional interaction deformities. Finally, this study was able to identify that genetic mutations mapped to exon-2 region, highly conserved LCEWAV and Catenin domains induce biochemically severe amino acid changes which alters the protein phenotype as well as the stability of MED12-CYCC interactions. To better understand the deleterious nature of FS-IDs and Indels, this study asserts the utility of computational screening based on their propensity towards non-sense mediated decay. Current study findings may help to narrow down the number of MED12 mutations to be screened for mediator complex dysfunction associated genetic diseases. This study supports computational methods as a primary filter to verify the plausible impact of pathogenic mutations based on the perspective of evolution, expression and phenotype of proteins. J. Cell. Biochem. 117: 2023-2035, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  18. Application of gene expression programming and neural networks to predict adverse events of radical hysterectomy in cervical cancer patients.

    Science.gov (United States)

    Kusy, Maciej; Obrzut, Bogdan; Kluska, Jacek

    2013-12-01

    The aim of this article was to compare gene expression programming (GEP) method with three types of neural networks in the prediction of adverse events of radical hysterectomy in cervical cancer patients. One-hundred and seven patients treated by radical hysterectomy were analyzed. Each record representing a single patient consisted of 10 parameters. The occurrence and lack of perioperative complications imposed a two-class classification problem. In the simulations, GEP algorithm was compared to a multilayer perceptron (MLP), a radial basis function network neural, and a probabilistic neural network. The generalization ability of the models was assessed on the basis of their accuracy, the sensitivity, the specificity, and the area under the receiver operating characteristic curve (AUROC). The GEP classifier provided best results in the prediction of the adverse events with the accuracy of 71.96 %. Comparable but slightly worse outcomes were obtained using MLP, i.e., 71.87 %. For each of measured indices: accuracy, sensitivity, specificity, and the AUROC, the standard deviation was the smallest for the models generated by GEP classifier.

  19. A meta-analysis of gene expression-based biomarkers predicting outcome after tamoxifen treatment in breast cancer.

    Science.gov (United States)

    Mihály, Zsuzsanna; Kormos, Máté; Lánczky, András; Dank, Magdolna; Budczies, Jan; Szász, Marcell A; Győrffy, Balázs

    2013-07-01

    To date, three molecular markers (ER, PR, and CYP2D6) have been used in clinical setting to predict the benefit of the anti-estrogen tamoxifen therapy. Our aim was to validate new biomarker candidates predicting response to tamoxifen treatment in breast cancer by evaluating these in a meta-analysis of available transcriptomic datasets with known treatment and follow-up. Biomarker candidates were identified in Pubmed and in the 2007-2012 ASCO and 2011-2012 SABCS abstracts. Breast cancer microarray datasets of endocrine therapy-treated patients were downloaded from GEO and EGA and RNAseq datasets from TCGA. Of the biomarker candidates, only those identified or already validated in a clinical cohort were included. Relapse-free survival (RFS) up to 5 years was used as endpoint in a ROC analysis in the GEO and RNAseq datasets. In the EGA dataset, Kaplan-Meier analysis was performed for overall survival. Statistical significance was set at p tamoxifen-resistance genes in three independent platforms and identified PGR, MAPT, and SLC7A5 as the most promising prognostic biomarkers in tamoxifen treated patients.

  20. Channelopathy-related SCN10A gene variants predict cerebellar dysfunction in multiple sclerosis.

    Science.gov (United States)

    Roostaei, Tina; Sadaghiani, Shokufeh; Park, Min Tae M; Mashhadi, Rahil; Nazeri, Aria; Noshad, Sina; Salehi, Mohammad Javad; Naghibzadeh, Maryam; Moghadasi, Abdorreza Naser; Owji, Mahsa; Doosti, Rozita; Taheri, Amir Pejman Hashemi; Rad, Ali Shakouri; Azimi, Amirreza; Chakravarty, M Mallar; Voineskos, Aristotle N; Nazeri, Arash; Sahraian, Mohammad Ali

    2016-02-02

    To determine the motor-behavioral and neural correlates of putative functional common variants in the sodium-channel NaV1.8 encoding gene (SCN10A) in vivo in patients with multiple sclerosis (MS). We recruited 161 patients with relapsing-onset MS and 94 demographically comparable healthy participants. All patients with MS underwent structural MRI and clinical examinations (Expanded Disability Status Scale [EDSS] and Multiple Sclerosis Functional Composite [MSFC]). Whole-brain voxel-wise and cerebellar volumetry were performed to assess differences in regional brain volumes between genotype groups. Resting-state fMRI was acquired from 62 patients with MS to evaluate differences in cerebellar functional connectivity. All participants were genotyped for 4 potentially functional SCN10A polymorphisms. Two SCN10A polymorphisms in high linkage disequilibrium (r(2) = 0.95) showed significant association with MSFC performance in patients with MS (rs6795970: p = 6.2 × 10(-4); rs6801957: p = 0.0025). Patients with MS with rs6795970(AA) genotype performed significantly worse than rs6795970(G) carriers in MSFC (p = 1.8 × 10(-4)) and all of its subscores. This association was independent of EDSS and cerebellar atrophy. Although the genotype groups showed no difference in regional brain volumes, rs6795970(AA) carriers demonstrated significantly diminished cerebellar functional connectivity with the thalami and midbrain. No significant SCN10A-genotype effect was observed on MSFC performance in healthy participants. Our data suggest that SCN10A variation substantially influences functional status, including prominent effects on motor coordination in patients with MS. These findings were supported by the effects of this variant on a neural system important for motor coordination, namely cerebello-thalamic circuitry. Overall, our findings add to the emerging evidence that suggests that sodium channel NaV1.8 could serve as a target for future drug-based interventions to treat

  1. Stoichiometric Representation of Gene–Protein–Reaction Associations Leverages Constraint-Based Analysis from Reaction to Gene-Level Phenotype Prediction

    DEFF Research Database (Denmark)

    Machado, Daniel; Herrgard, Markus; Rocha, Isabel

    2016-01-01

    Genome-scale metabolic reconstructions are currently available for hundreds of organisms. Constraint-based modeling enables the analysis of the phenotypic landscape of these organisms, predicting the response to genetic and environmental perturbations. However, since constraint-based models can...... level by explicitly accounting for the individual fluxes of enzymes (and subunits) encoded by each gene. We show how this can be applied to different kinds of constraint-based analysis: flux distribution prediction, gene essentiality analysis, random flux sampling, elementary mode analysis...... only describe the metabolic phenotype at the reaction level, understanding the mechanistic link between genotype and phenotype is still hampered by the complexity of gene-protein-reaction associations. We implement a model transformation that enables constraint-based methods to be applied at the gene...

  2. Development and Validation of a Gene-Based Model for Outcome Prediction in Germ Cell Tumors Using a Combined Genomic and Expression Profiling Approach.

    Directory of Open Access Journals (Sweden)

    James E Korkola

    Full Text Available Germ Cell Tumors (GCT have a high cure rate, but we currently lack the ability to accurately identify the small subset of patients who will die from their disease. We used a combined genomic and expression profiling approach to identify genomic regions and underlying genes that are predictive of outcome in GCT patients. We performed array-based comparative genomic hybridization (CGH on 53 non-seminomatous GCTs (NSGCTs treated with cisplatin based chemotherapy and defined altered genomic regions using Circular Binary Segmentation. We identified 14 regions associated with two year disease-free survival (2yDFS and 16 regions associated with five year disease-specific survival (5yDSS. From corresponding expression data, we identified 101 probe sets that showed significant changes in expression. We built several models based on these differentially expressed genes, then tested them in an independent validation set of 54 NSGCTs. These predictive models correctly classified outcome in 64-79.6% of patients in the validation set, depending on the endpoint utilized. Survival analysis demonstrated a significant separation of patients with good versus poor predicted outcome when using a combined gene set model. Multivariate analysis using clinical risk classification with the combined gene model indicated that they were independent prognostic markers. This novel set of predictive genes from altered genomic regions is almost entirely independent of our previously identified set of predictive genes for patients with NSGCTs. These genes may aid in the identification of the small subset of patients who are at high risk of poor outcome.

  3. A p53-regulated apoptotic gene signature predicts treatment response and outcome in pediatric acute lymphoblastic leukemia

    Directory of Open Access Journals (Sweden)

    Bainer RO

    2017-09-01

    Full Text Available Russell O Bainer,1 Matthew R Trendowski,2 Cheng Cheng,3 Deqing Pei,3 Wenjian Yang,3 Steven W Paugh,4 Kathleen H Goss,5 Andrew D Skol,6 Paul Pavlidis,7 Ching-Hon Pui,4,8 T Conrad Gilliam,1 William E Evans,4,9,* Kenan Onel10–13,* 1Department of Human Genetics, 2Department of Medicine, Section of Hematology/Oncology, The University of Chicago, Chicago, IL, 3Department of Biostatistics, 4Hematological Malignancy Program, St Jude Children’s Research Hospital, Memphis, TN, 5University of Chicago Medicine Comprehensive Cancer Center, 6Department of Pediatrics, The University of Chicago, Chicago, IL, USA; 7Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada; 8Department of Oncology, 9Department of Pharmaceutical Sciences, St Jude Children’s Research Hospital, Memphis, TN, 10Division of Human Genetics and Genomics, 11Division of Hematology/Oncology and Stem Cell Transplantation, Cohen Children’s Medical Center, New Hyde Park, 12The Feinstein Institute for Medical Research, Manhasset, NY, 13Hofstra Northwell School of Medicine, Hofstra University, Hempstead, NY, USA *These authors contributed equally to this work Abstract: Gene signatures have been associated with outcome in pediatric acute lymphoblastic leukemia (ALL and other malignancies. However, determining the molecular drivers of these expression changes remains challenging. In ALL blasts, the p53 tumor suppressor is the primary regulator of the apoptotic response to genotoxic chemotherapy, which is predictive of outcome. Consequently, we hypothesized that the normal p53-regulated apoptotic response to DNA damage would be altered in ALL and that this alteration would influence drug response and treatment outcome. To test this, we first used global expression profiling in related human B-lineage lymphoblastoid cell lines with either wild type or mutant TP53 to characterize the normal p53-mediated transcriptional response to ionizing radiation (IR and identified

  4. Predicting in vivo gene expression in macrophages after exposure to benzo(a)pyrene based on in vitro assays and toxicokinetic/toxicodynamic models

    OpenAIRE

    Péry, Alexandre R R; Brochot, Céline; Desmots, Sophie; Boize, Magali; Sparfel, Lydie; Fardel, Olivier

    2011-01-01

    International audience; Predictive toxicology aims at developing methodologies to relate the results obtained from in vitro experiments to in vivo exposure. In the case of polycyclic aromatic hydrocarbons (PAHs), a substantial amount of knowledge on effects and modes of action has been recently obtained from in vitro studies of gene expression. In the current study, we built a physiologically based toxicokinetic (PBTK) model to relate in vivo and in vitro gene expression in case of exposure t...

  5. Progress and challenges in the computational prediction of gene function using networks [v1; ref status: indexed, http://f1000r.es/SqmJUM

    Directory of Open Access Journals (Sweden)

    Paul Pavlidis

    2012-09-01

    Full Text Available In this opinion piece, we attempt to unify recent arguments we have made that serious confounds affect the use of network data to predict and characterize gene function. The development of computational approaches to determine gene function is a major strand of computational genomics research. However, progress beyond using BLAST to transfer annotations has been surprisingly slow. We have previously argued that a large part of the reported success in using "guilt by association" in network data is due to the tendency of methods to simply assign new functions to already well-annotated genes. While such predictions will tend to be correct, they are generic; it is true, but not very helpful, that a gene with many functions is more likely to have any function. We have also presented evidence that much of the remaining performance in cross-validation cannot be usefully generalized to new predictions, making progressive improvement in analysis difficult to engineer. Here we summarize our findings about how these problems will affect network analysis, discuss some ongoing responses within the field to these issues, and consolidate some recommendations and speculation, which we hope will modestly increase the reliability and specificity of gene function prediction.

  6. Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data.

    Science.gov (United States)

    Held, Elizabeth; Cape, Joshua; Tintle, Nathan

    2016-01-01

    Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.

  7. Gene

    Data.gov (United States)

    U.S. Department of Health & Human Services — Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes,...

  8. Epigenetic variation in the serotonin transporter gene predicts resting state functional connectivity strength within the salience-network.

    Science.gov (United States)

    Muehlhan, Markus; Kirschbaum, Clemens; Wittchen, Hans-Ulrich; Alexander, Nina

    2015-11-01

    Genetic variation in the serotonin transporter gene (SLC6A4) has been associated with psychopathology and aberrant brain functioning in a plethora of clinical and imaging studies. In contrast, the neurobiological correlates of epigenetic signatures in SLC6A4, such as DNA methylation profiles, have only recently been explored in human brain imaging research. The pr