WorldWideScience

Sample records for gene predicts extrastriatal

  1. Extrastriatal monoaminergic dysfunction and enhanced microglial activation in idiopathic rapid eye movement sleep behaviour disorder

    DEFF Research Database (Denmark)

    Stokholm, Morten Gersel; Iranzo, Alex; Østergaard, Karen

    2018-01-01

    BACKGROUND: The majority of patients diagnosed with idiopathic rapid eye movement sleep behaviour disorder (iRBD) progress over time to a Lewy-type α-synucleinopathy such as Parkinson's disease or dementia with Lewy bodies. This in vivo molecular imaging study aimed to investigate if extrastriatal...

  2. Extrastriatal dopaminergic changes in Parkinson's disease patients with impulse control disorders.

    Science.gov (United States)

    Lee, Jee-Young; Seo, Seong Ho; Kim, Yu Kyeong; Yoo, Hye Bin; Kim, Young Eun; Song, In Chan; Lee, Jae Sung; Jeon, Beom S

    2014-01-01

    To investigate the extrastriatal dopaminergic neural changes in relation to the medication-related impulse control disorders (ICD) in Parkinson's disease (PD). A total of 31 subjects (11 and 11 drug-treated PD patients with and without medication-related ICDs and 9 healthy controls) having no other co-morbid psychiatric disorders participated in this study. Each subject underwent dynamic N-(3-[(18)F]fluoropropyl)-2-carbomethoxy-3-(4-iodophenyl) nortropane (FP-CIT) positron emission tomography scans. Binding potentials (BP) at nucleus accumbens, amygdala, orbitofrontal and ventromedial prefrontal cortex (VMPFC), putamen and caudate nucleus were estimated, and whole brain parametric maps of [(18)F]-FP-CIT binding were analysed by original and putaminal normalised manners. Compared with the healthy controls, BPs at both VMPFCs were significantly high and the extrastriatal to putaminal BP ratios at all regions were approximately three times higher in both PD groups. The PD ICD patients showed significantly higher BPs at the right VMPFC and tendency to lower BPs at the left nucleus accumbens compared with those free of ICD. The ICD subjects also showed reduced uptakes at both ventral striatal regions in the original parametric analysis and higher uptakes at the left insular and right posterior cingulate cortex and lower uptakes at both ventral pallidums in the putaminal normalised parametric analysis compared with the non-ICD subjects. A great gap in extrastriatal versus striatal dopaminergic fibre degenerations is an intrinsic condition predisposing to ICD in PD. Distinct pattern of extrastriatal changes between the ICD and non-ICD patients could provide a further insight into a mechanism of ICD in PD.

  3. Extrastriatal binding of [¹²³I]FP-CIT in the thalamus and pons

    DEFF Research Database (Denmark)

    Koch, Walter; Unterrainer, Marcus; Xiong, Guoming

    2014-01-01

    extrastriatal binding (predominantly due to SERT) and its age and gender dependencies in a large cohort of healthy controls. METHODS: SPECT data from 103 healthy controls with well-defined criteria of normality acquired at 13 different imaging centres were analysed for extrastriatal binding using volumes...... error) of 8.2 ± 1.3 % for the thalamus and 6.8 ± 2.9 % for the pons was shown. CONCLUSION: The potential to evaluate extrastriatal predominant SERT binding in addition to the striatal DAT in a single imaging session was shown using a large database of [(123)I]FP-CIT scans in healthy controls. For both...... the thalamus and the pons, an age-related decline in radiotracer binding was observed. Gender effects were demonstrated for binding in the thalamus only. As a potential clinical application, the data could be used as a reference to estimate SERT occupancy in addition to nigrostriatal integrity when using [(123...

  4. Vertebrate gene predictions and the problem of large genes

    DEFF Research Database (Denmark)

    Wang, Jun; Li, ShengTing; Zhang, Yong

    2003-01-01

    To find unknown protein-coding genes, annotation pipelines use a combination of ab initio gene prediction and similarity to experimentally confirmed genes or proteins. Here, we show that although the ab initio predictions have an intrinsically high false-positive rate, they also have a consistent...

  5. Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing.

    Science.gov (United States)

    Zhao, Yingwen; Fu, Guangyuan; Wang, Jun; Guo, Maozu; Yu, Guoxian

    2018-02-23

    Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash. Copyright © 2018 Elsevier Inc. All rights reserved.

  6. Acute effect of intravenously applied alcohol in the human striatal and extrastriatal D2 /D3 dopamine system.

    Science.gov (United States)

    Pfeifer, Philippe; Tüscher, Oliver; Buchholz, Hans Georg; Gründer, Gerhard; Vernaleken, Ingo; Paulzen, Michael; Zimmermann, Ulrich S; Maus, Stephan; Lieb, Klaus; Eggermann, Thomas; Fehr, Christoph; Schreckenberger, Mathias

    2017-09-01

    Investigations on the acute effects of alcohol in the human mesolimbic dopamine D 2 /D 3 receptor system have yielded conflicting results. With respect to the effects of alcohol on extrastriatal D 2 /D 3 dopamine receptors no investigations have been reported yet. Therefore we applied PET imaging using the postsynaptic dopamine D 2 /D 3 receptor ligand [ 18 F]fallypride addressing the question, whether intravenously applied alcohol stimulates the extrastriatal and striatal dopamine system. We measured subjective effects of alcohol and made correlation analyses with the striatal and extrastriatal D 2 /D 3 binding potential. Twenty-four healthy male μ-opioid receptor (OPRM1)118G allele carriers underwent a standardized intravenous and placebo alcohol administration. The subjective effects of alcohol were measured with a visual analogue scale. For the evaluation of the dopamine response we calculated the binding potential (BP ND ) by using the simplified reference tissue model (SRTM). In addition, we calculated distribution volumes (target and reference regions) in 10 subjects for which metabolite corrected arterial samples were available. In the alcohol condition no significant dopamine response in terms of a reduction of BP ND was observed in striatal and extrastriatal brain regions. We found a positive correlation for 'liking' alcohol and the BP ND in extrastriatal brain regions (Inferior frontal cortex (IFC) (r = 0.533, p = 0.007), orbitofrontal cortex (OFC) (r = 0.416, p = 0.043) and prefrontal cortex (PFC) (r = 0.625, p = 0.001)). The acute alcohol effects on the D 2 /D 3 dopamine receptor binding potential of the striatal and extrastriatal system in our experiment were insignificant. A positive correlation of the subjective effect of 'liking' alcohol with cortical D 2 /D 3 receptors may hint at an addiction relevant trait. © 2016 Society for the Study of Addiction.

  7. Combining gene prediction methods to improve metagenomic gene annotation

    Directory of Open Access Journals (Sweden)

    Rosen Gail L

    2011-01-01

    Full Text Available Abstract Background Traditional gene annotation methods rely on characteristics that may not be available in short reads generated from next generation technology, resulting in suboptimal performance for metagenomic (environmental samples. Therefore, in recent years, new programs have been developed that optimize performance on short reads. In this work, we benchmark three metagenomic gene prediction programs and combine their predictions to improve metagenomic read gene annotation. Results We not only analyze the programs' performance at different read-lengths like similar studies, but also separate different types of reads, including intra- and intergenic regions, for analysis. The main deficiencies are in the algorithms' ability to predict non-coding regions and gene edges, resulting in more false-positives and false-negatives than desired. In fact, the specificities of the algorithms are notably worse than the sensitivities. By combining the programs' predictions, we show significant improvement in specificity at minimal cost to sensitivity, resulting in 4% improvement in accuracy for 100 bp reads with ~1% improvement in accuracy for 200 bp reads and above. To correctly annotate the start and stop of the genes, we find that a consensus of all the predictors performs best for shorter read lengths while a unanimous agreement is better for longer read lengths, boosting annotation accuracy by 1-8%. We also demonstrate use of the classifier combinations on a real dataset. Conclusions To optimize the performance for both prediction and annotation accuracies, we conclude that the consensus of all methods (or a majority vote is the best for reads 400 bp and shorter, while using the intersection of GeneMark and Orphelia predictions is the best for reads 500 bp and longer. We demonstrate that most methods predict over 80% coding (including partially coding reads on a real human gut sample sequenced by Illumina technology.

  8. [18F]fallypride characterization of striatal and extrastriatal D2/3 receptors in Parkinson's disease.

    Science.gov (United States)

    Stark, Adam J; Smith, Christopher T; Petersen, Kalen J; Trujillo, Paula; van Wouwe, Nelleke C; Donahue, Manus J; Kessler, Robert M; Deutch, Ariel Y; Zald, David H; Claassen, Daniel O

    2018-01-01

    Parkinson's disease (PD) is characterized by widespread degeneration of monoaminergic (especially dopaminergic) networks, manifesting with a number of both motor and non-motor symptoms. Regional alterations to dopamine D 2/3 receptors in PD patients are documented in striatal and some extrastriatal areas, and medications that target D 2/3 receptors can improve motor and non-motor symptoms. However, data regarding the combined pattern of D 2/3 receptor binding in both striatal and extrastriatal regions in PD are limited. We studied 35 PD patients off-medication and 31 age- and sex-matched healthy controls (HCs) using PET imaging with [ 18 F]fallypride, a high affinity D 2/3 receptor ligand, to measure striatal and extrastriatal D 2/3 nondisplaceable binding potential (BP ND ). PD patients completed PET imaging in the off medication state, and motor severity was concurrently assessed. Voxel-wise evaluation between groups revealed significant BP ND reductions in PD patients in striatal and several extrastriatal regions, including the locus coeruleus and mesotemporal cortex. A region-of-interest (ROI) based approach quantified differences in dopamine D 2/3 receptors, where reduced BP ND was noted in the globus pallidus, caudate, amygdala, hippocampus, ventral midbrain, and thalamus of PD patients relative to HC subjects. Motor severity positively correlated with D 2/3 receptor density in the putamen and globus pallidus. These findings support the hypothesis that abnormal D 2/3 expression occurs in regions related to both the motor and non-motor symptoms of PD, including areas richly invested with noradrenergic neurons.

  9. Blood Gene Expression Predicts Bronchiolitis Obliterans Syndrome

    Directory of Open Access Journals (Sweden)

    Richard Danger

    2018-01-01

    Full Text Available Bronchiolitis obliterans syndrome (BOS, the main manifestation of chronic lung allograft dysfunction, leads to poor long-term survival after lung transplantation. Identifying predictors of BOS is essential to prevent the progression of dysfunction before irreversible damage occurs. By using a large set of 107 samples from lung recipients, we performed microarray gene expression profiling of whole blood to identify early biomarkers of BOS, including samples from 49 patients with stable function for at least 3 years, 32 samples collected at least 6 months before BOS diagnosis (prediction group, and 26 samples at or after BOS diagnosis (diagnosis group. An independent set from 25 lung recipients was used for validation by quantitative PCR (13 stables, 11 in the prediction group, and 8 in the diagnosis group. We identified 50 transcripts differentially expressed between stable and BOS recipients. Three genes, namely POU class 2 associating factor 1 (POU2AF1, T-cell leukemia/lymphoma protein 1A (TCL1A, and B cell lymphocyte kinase, were validated as predictive biomarkers of BOS more than 6 months before diagnosis, with areas under the curve of 0.83, 0.77, and 0.78 respectively. These genes allow stratification based on BOS risk (log-rank test p < 0.01 and are not associated with time posttransplantation. This is the first published large-scale gene expression analysis of blood after lung transplantation. The three-gene blood signature could provide clinicians with new tools to improve follow-up and adapt treatment of patients likely to develop BOS.

  10. Predicting Hydrologic Function With Aquatic Gene Fragments

    Science.gov (United States)

    Good, S. P.; URycki, D. R.; Crump, B. C.

    2018-03-01

    Recent advances in microbiology techniques, such as genetic sequencing, allow for rapid and cost-effective collection of large quantities of genetic information carried within water samples. Here we posit that the unique composition of aquatic DNA material within a water sample contains relevant information about hydrologic function at multiple temporal scales. In this study, machine learning was used to develop discharge prediction models trained on the relative abundance of bacterial taxa classified into operational taxonomic units (OTUs) based on 16S rRNA gene sequences from six large arctic rivers. We term this approach "genohydrology," and show that OTU relative abundances can be used to predict river discharge at monthly and longer timescales. Based on a single DNA sample from each river, the average Nash-Sutcliffe efficiency (NSE) for predicted mean monthly discharge values throughout the year was 0.84, while the NSE for predicted discharge values across different return intervals was 0.67. These are considerable improvements over predictions based only on the area-scaled mean specific discharge of five similar rivers, which had average NSE values of 0.64 and -0.32 for seasonal and recurrence interval discharge values, respectively. The genohydrology approach demonstrates that genetic diversity within the aquatic microbiome is a large and underutilized data resource with benefits for prediction of hydrologic function.

  11. Predicting cellular growth from gene expression signatures.

    Directory of Open Access Journals (Sweden)

    Edoardo M Airoldi

    2009-01-01

    Full Text Available Maintaining balanced growth in a changing environment is a fundamental systems-level challenge for cellular physiology, particularly in microorganisms. While the complete set of regulatory and functional pathways supporting growth and cellular proliferation are not yet known, portions of them are well understood. In particular, cellular proliferation is governed by mechanisms that are highly conserved from unicellular to multicellular organisms, and the disruption of these processes in metazoans is a major factor in the development of cancer. In this paper, we develop statistical methodology to identify quantitative aspects of the regulatory mechanisms underlying cellular proliferation in Saccharomyces cerevisiae. We find that the expression levels of a small set of genes can be exploited to predict the instantaneous growth rate of any cellular culture with high accuracy. The predictions obtained in this fashion are robust to changing biological conditions, experimental methods, and technological platforms. The proposed model is also effective in predicting growth rates for the related yeast Saccharomyces bayanus and the highly diverged yeast Schizosaccharomyces pombe, suggesting that the underlying regulatory signature is conserved across a wide range of unicellular evolution. We investigate the biological significance of the gene expression signature that the predictions are based upon from multiple perspectives: by perturbing the regulatory network through the Ras/PKA pathway, observing strong upregulation of growth rate even in the absence of appropriate nutrients, and discovering putative transcription factor binding sites, observing enrichment in growth-correlated genes. More broadly, the proposed methodology enables biological insights about growth at an instantaneous time scale, inaccessible by direct experimental methods. Data and tools enabling others to apply our methods are available at http://function.princeton.edu/growthrate.

  12. Genomic Prediction of Gene Bank Wheat Landraces

    Directory of Open Access Journals (Sweden)

    José Crossa

    2016-07-01

    Full Text Available This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H for the highly heritable traits, days to heading (DTH, and days to maturity (DTM. Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E. Two alternative prediction strategies were studied: (1 random cross-validation of the data in 20% training (TRN and 80% testing (TST (TRN20-TST80 sets, and (2 two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm

  13. Ecological transition predictably associated with gene degeneration.

    Science.gov (United States)

    Wessinger, Carolyn A; Rausher, Mark D

    2015-02-01

    Gene degeneration or loss can significantly contribute to phenotypic diversification, but may generate genetic constraints on future evolutionary trajectories, potentially restricting phenotypic reversal. Such constraints may manifest as directional evolutionary trends when parallel phenotypic shifts consistently involve gene degeneration or loss. Here, we demonstrate that widespread parallel evolution in Penstemon from blue to red flowers predictably involves the functional inactivation and degeneration of the enzyme flavonoid 3',5'-hydroxylase (F3'5'H), an anthocyanin pathway enzyme required for the production of blue floral pigments. Other types of genetic mutations do not consistently accompany this phenotypic shift. This pattern may be driven by the relatively large mutational target size of degenerative mutations to this locus and the apparent lack of associated pleiotropic effects. The consistent degeneration of F3'5'H may provide a mechanistic explanation for the observed asymmetry in the direction of flower color evolution in Penstemon: Blue to red transitions are common, but reverse transitions have not been observed. Although phenotypic shifts in this system are likely driven by natural selection, internal constraints may generate predictable genetic outcomes and may restrict future evolutionary trajectories. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  14. A comparative analysis of soft computing techniques for gene prediction.

    Science.gov (United States)

    Goel, Neelam; Singh, Shailendra; Aseri, Trilok Chand

    2013-07-01

    The rapid growth of genomic sequence data for both human and nonhuman species has made analyzing these sequences, especially predicting genes in them, very important and is currently the focus of many research efforts. Beside its scientific interest in the molecular biology and genomics community, gene prediction is of considerable importance in human health and medicine. A variety of gene prediction techniques have been developed for eukaryotes over the past few years. This article reviews and analyzes the application of certain soft computing techniques in gene prediction. First, the problem of gene prediction and its challenges are described. These are followed by different soft computing techniques along with their application to gene prediction. In addition, a comparative analysis of different soft computing techniques for gene prediction is given. Finally some limitations of the current research activities and future research directions are provided. Copyright © 2013 Elsevier Inc. All rights reserved.

  15. Clinicopathologic and gene expression parameters predict liver cancer prognosis

    International Nuclear Information System (INIS)

    Hao, Ke; Zhong, Hua; Greenawalt, Danielle; Ferguson, Mark D; Ng, Irene O; Sham, Pak C; Poon, Ronnie T; Molony, Cliona; Schadt, Eric E; Dai, Hongyue; Luk, John M; Lamb, John; Zhang, Chunsheng; Xie, Tao; Wang, Kai; Zhang, Bin; Chudin, Eugene; Lee, Nikki P; Mao, Mao

    2011-01-01

    The prognosis of hepatocellular carcinoma (HCC) varies following surgical resection and the large variation remains largely unexplained. Studies have revealed the ability of clinicopathologic parameters and gene expression to predict HCC prognosis. However, there has been little systematic effort to compare the performance of these two types of predictors or combine them in a comprehensive model. Tumor and adjacent non-tumor liver tissues were collected from 272 ethnic Chinese HCC patients who received curative surgery. We combined clinicopathologic parameters and gene expression data (from both tissue types) in predicting HCC prognosis. Cross-validation and independent studies were employed to assess prediction. HCC prognosis was significantly associated with six clinicopathologic parameters, which can partition the patients into good- and poor-prognosis groups. Within each group, gene expression data further divide patients into distinct prognostic subgroups. Our predictive genes significantly overlap with previously published gene sets predictive of prognosis. Moreover, the predictive genes were enriched for genes that underwent normal-to-tumor gene network transformation. Previously documented liver eSNPs underlying the HCC predictive gene signatures were enriched for SNPs that associated with HCC prognosis, providing support that these genes are involved in key processes of tumorigenesis. When applied individually, clinicopathologic parameters and gene expression offered similar predictive power for HCC prognosis. In contrast, a combination of the two types of data dramatically improved the power to predict HCC prognosis. Our results also provided a framework for understanding the impact of gene expression on the processes of tumorigenesis and clinical outcome

  16. Correlation of individual differences in schizotypal personality traits with amphetamine-induced dopamine release in striatal and extrastriatal brain regions.

    Science.gov (United States)

    Woodward, Neil D; Cowan, Ronald L; Park, Sohee; Ansari, M Sib; Baldwin, Ronald M; Li, Rui; Doop, Mikisha; Kessler, Robert M; Zald, David H

    2011-04-01

    Schizotypal personality traits are associated with schizophrenia spectrum disorders, and individuals with schizophrenia spectrum disorders demonstrate increased dopamine transmission in the striatum. The authors sought to determine whether individual differences in normal variation in schizotypal traits are correlated with dopamine transmission in the striatum and in extrastriatal brain regions. Sixty-three healthy volunteers with no history of psychiatric illness completed the Schizotypal Personality Questionnaire and underwent positron emission tomography imaging with [(18)F]fallypride at baseline and after administration of oral d-amphetamine (0.43 mg/kg). Dopamine release, quantified by subtracting each participant's d-amphetamine scan from his or her baseline scan, was correlated with Schizotypal Personality Questionnaire total and factor scores using region-of-interest and voxel-wise analyses. Dopamine release in the striatum was positively correlated with overall schizotypal traits. The association was especially robust in the associative subdivision of the striatum. Voxel-wise analyses identified additional correlations between dopamine release and schizotypal traits in the left middle frontal gyrus and left supramarginal gyrus. Exploratory analyses of Schizotypal Personality Questionnaire factor scores revealed correlations between dopamine release and disorganized schizotypal traits in the striatum, thalamus, medial prefrontal cortex, temporal lobe, insula, and inferior frontal cortex. The association between dopamine signaling and psychosis phenotypes extends to individual differences in normal variation in schizotypal traits and involves dopamine transmission in both striatal and extrastriatal brain regions. Amphetamine-induced dopamine release may be a useful endophenotype for investigating the genetic basis of schizophrenia spectrum disorders.

  17. An algorithm to discover gene signatures with predictive potential

    Directory of Open Access Journals (Sweden)

    Hallett Robin M

    2010-09-01

    Full Text Available Abstract Background The advent of global gene expression profiling has generated unprecedented insight into our molecular understanding of cancer, including breast cancer. For example, human breast cancer patients display significant diversity in terms of their survival, recurrence, metastasis as well as response to treatment. These patient outcomes can be predicted by the transcriptional programs of their individual breast tumors. Predictive gene signatures allow us to correctly classify human breast tumors into various risk groups as well as to more accurately target therapy to ensure more durable cancer treatment. Results Here we present a novel algorithm to generate gene signatures with predictive potential. The method first classifies the expression intensity for each gene as determined by global gene expression profiling as low, average or high. The matrix containing the classified data for each gene is then used to score the expression of each gene based its individual ability to predict the patient characteristic of interest. Finally, all examined genes are ranked based on their predictive ability and the most highly ranked genes are included in the master gene signature, which is then ready for use as a predictor. This method was used to accurately predict the survival outcomes in a cohort of human breast cancer patients. Conclusions We confirmed the capacity of our algorithm to generate gene signatures with bona fide predictive ability. The simplicity of our algorithm will enable biological researchers to quickly generate valuable gene signatures without specialized software or extensive bioinformatics training.

  18. Exploring the Optimal Strategy to Predict Essential Genes in Microbes

    Directory of Open Access Journals (Sweden)

    Yao Lu

    2011-12-01

    Full Text Available Accurately predicting essential genes is important in many aspects of biology, medicine and bioengineering. In previous research, we have developed a machine learning based integrative algorithm to predict essential genes in bacterial species. This algorithm lends itself to two approaches for predicting essential genes: learning the traits from known essential genes in the target organism, or transferring essential gene annotations from a closely related model organism. However, for an understudied microbe, each approach has its potential limitations. The first is constricted by the often small number of known essential genes. The second is limited by the availability of model organisms and by evolutionary distance. In this study, we aim to determine the optimal strategy for predicting essential genes by examining four microbes with well-characterized essential genes. Our results suggest that, unless the known essential genes are few, learning from the known essential genes in the target organism usually outperforms transferring essential gene annotations from a related model organism. In fact, the required number of known essential genes is surprisingly small to make accurate predictions. In prokaryotes, when the number of known essential genes is greater than 2% of total genes, this approach already comes close to its optimal performance. In eukaryotes, achieving the same best performance requires over 4% of total genes, reflecting the increased complexity of eukaryotic organisms. Combining the two approaches resulted in an increased performance when the known essential genes are few. Our investigation thus provides key information on accurately predicting essential genes and will greatly facilitate annotations of microbial genomes.

  19. Carbon-11 epidepride: a suitable radioligand for PET investigation of striatal and extrastriatal dopamine D{sub 2} receptors

    Energy Technology Data Exchange (ETDEWEB)

    Langer, Oliver; Halldin, Christer E-mail: christer.halldin@neuro.ks.se; Dolle, Frederic; Swahn, Carl-Gunnar; Olsson, Hans; Lundkvist, Per Karlsson; Hall, Haakan; Sandell, Johan; Vaufrey, Camilla; Loc' h, Christian; Franzoise; Crouzel, Christian; Maziere, Bernard; Farde, Lars

    1999-07-01

    Epidepride {l_brace}(S)-(-)-N-([1-ethyl-2-pyrrolidinyl]methyl)-5-iodo-2,3-dimethoxybenzamide= {r_brace} binds with a picomolar affinity (K{sub i}=24 pM) to the dopamine D{sub 2} receptor. Iodine-123-labeled epidepride has been used previously to study striatal and extrastriatal dopamine D{sub 2} receptors with single photon emission computed tomography (SPECT). Our aim was to label epidepride with carbon-11 for comparative quantitative studies between positron emission tomography (PET) and SPECT. Epidepride was synthesized from its bromo-analogue FLB 457 via the corresponding trimethyl-tin derivative. In an alternative synthetic pathway, the corresponding substituted benzoic acid was reacted with the optically pure aminomethylpyrrolidine-derivative. Demethylation of epidepride gave the desmethyl-derivative, which was reacted with [{sup 11}C]methyl triflate. Total radiochemical yield was 40-50% within a total synthesis time of 30 min. The specific radioactivity at the end of synthesis was 37-111 GBq/{mu}mol (1,000-3,000 Ci/mmol). Human postmortem whole-hemisphere autoradiography demonstrated dense binding in the caudate putamen, and also in extrastriatal areas such as the thalamus and the neocortex. The binding was inhibited by unlabeled raclopride. PET studies in a cynomolgus monkey demonstrated high uptake in the striatum and in several extrastriatal regions. At 90 min after injection, uptake in the striatum, thalamus and neocortex was about 11, 4, and 2 times higher than in the cerebellum, respectively. Pretreatment experiment with unlabeled raclopride (1 mg/kg) inhibited 50-70% of [{sup 11}C]epidepride binding. The fraction of unchanged [{sup 11}C]epidepride in monkey plasma determined by a gradient high performance liquid chromatography (HPLC) method was about 30% of the total radioactivity at 30 min after injection of [{sup 11}C]epidepride. The availability of [{sup 11}C]epidepride allows the PET-verification of the data obtained from quantitation studies with

  20. A hybrid approach of gene sets and single genes for the prediction of survival risks with gene expression data.

    Science.gov (United States)

    Seok, Junhee; Davis, Ronald W; Xiao, Wenzhong

    2015-01-01

    Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn't been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.

  1. Extrastriatal dopamine D-2/3 receptors and cortical grey matter volumes in antipsychotic-naive schizophrenia patients before and after initial antipsychotic treatment

    DEFF Research Database (Denmark)

    Nørbak-Emig, Henrik; Pinborg, Lars H.; Raghava, Jayachandra M.

    2017-01-01

    OBJECTIVES: Long-term dopamine D2/3 receptor blockade, common to all antipsychotics, may underlie progressive brain volume changes observed in patients with chronic schizophrenia. In the present study, we examined associations between cortical volume changes and extrastriatal dopamine D2/3 recept...... binding potentials (BPND) in first-episode schizophrenia patents at baseline and after antipsychotic treatment. METHODS: Twenty-two initially antipsychotic-naïve patients underwent magnetic resonance imaging (MRI), [(123)I]epidepride single-photon emission computerised tomography (SPECT......), and psychopathology assessments before and after 3 months of treatment with either risperidone (N = 13) or zuclopenthixol (N = 9). Twenty healthy controls matched on age, gender and parental socioeconomic status underwent baseline MRI and SPECT. RESULTS: Neither extrastriatal D2/3 receptor BPND at baseline, nor...

  2. Extrastriatal binding of [123I]FP-CIT in the thalamus and pons: gender and age dependencies assessed in a European multicentre database of healthy controls

    International Nuclear Information System (INIS)

    Koch, Walter; Unterrainer, Marcus; Xiong, Guoming; Bartenstein, Peter; Diemling, Markus; Varrone, Andrea; Dickson, John C.; Tossici-Bolt, Livia; Sera, Terez; Asenbaum, Susanne; Booij, Jan; Kapucu, Ozlem L.; Kluge, Andreas; Ziebell, Morten; Darcourt, Jacques; Nobili, Flavio; Pagani, Marco; Hesse, Swen; Borght, Thierry Vander; Laere, Koen van; Tatsch, Klaus; La Fougere, Christian

    2014-01-01

    Apart from binding to the dopamine transporter (DAT), [ 123 I]FP-CIT shows moderate affinity for the serotonin transporter (SERT), allowing imaging of both monoamine transporters in a single imaging session in different brain areas. The aim of this study was to systematically evaluate extrastriatal binding (predominantly due to SERT) and its age and gender dependencies in a large cohort of healthy controls. SPECT data from 103 healthy controls with well-defined criteria of normality acquired at 13 different imaging centres were analysed for extrastriatal binding using volumes of interest analysis for the thalamus and the pons. Data were examined for gender and age effects as well as for potential influence of striatal DAT radiotracer binding. Thalamic binding was significantly higher than pons binding. Partial correlations showed an influence of putaminal DAT binding on measured binding in the thalamus but not on the pons. Data showed high interindividual variation in extrastriatal binding. Significant gender effects with 31 % higher binding in women than in men were observed in the thalamus, but not in the pons. An age dependency with a decline per decade (±standard error) of 8.2 ± 1.3 % for the thalamus and 6.8 ± 2.9 % for the pons was shown. The potential to evaluate extrastriatal predominant SERT binding in addition to the striatal DAT in a single imaging session was shown using a large database of [ 123 I]FP-CIT scans in healthy controls. For both the thalamus and the pons, an age-related decline in radiotracer binding was observed. Gender effects were demonstrated for binding in the thalamus only. As a potential clinical application, the data could be used as a reference to estimate SERT occupancy in addition to nigrostriatal integrity when using [ 123 I]FP-CIT for DAT imaging in patients treated with selective serotonin reuptake inhibitors. (orig.)

  3. Extrastriatal binding of [{sup 123}I]FP-CIT in the thalamus and pons: gender and age dependencies assessed in a European multicentre database of healthy controls

    Energy Technology Data Exchange (ETDEWEB)

    Koch, Walter; Unterrainer, Marcus; Xiong, Guoming; Bartenstein, Peter [University of Munich, Department of Nuclear Medicine, Munich (Germany); Diemling, Markus [Hermes Medical Solutions, Stockholm (Sweden); Varrone, Andrea [Karolinska University Hospital, Karolinska Institutet, Department of Clinical Neuroscience, Centre for Psychiatry Research, Stockholm (Sweden); Dickson, John C. [UCLH NHS Foundation Trust and University College, Institute of Nuclear Medicine, London (United Kingdom); Tossici-Bolt, Livia [University Hospitals Southampton NHS Trust, Department of Medical Physics, Southampton (United Kingdom); Sera, Terez [University of Szeged, Department of Nuclear Medicine and Euromedic Szeged, Szeged (Hungary); Asenbaum, Susanne [Medical University of Vienna, Department of Neurology, Vienna (Austria); Booij, Jan [University of Amsterdam, Department of Nuclear Medicine, Academic Medical Centre, Amsterdam (Netherlands); Kapucu, Ozlem L. [Gazi University, Department of Nuclear Medicine, Faculty of Medicine, Ankara (Turkey); Kluge, Andreas [ABX-CRO, Dresden (Germany); Ziebell, Morten [Rigshospitalet and University of Copenhagen, Neurobiology Research Unit, Copenhagen (Denmark); Darcourt, Jacques [University of Nice-Sophia Antipolis, Nuclear Medicine Department, Centre Antoine Lacassagne, Nice (France); Nobili, Flavio [University of Genoa, Clinical Neurology Unit, Department of Neuroscience (DINOGMI), Genoa (Italy); Pagani, Marco [CNR, Institute of Cognitive Sciences and Technologies, Rome (Italy); Karolinska Hospital, Department of Nuclear Medicine, Stockholm (Sweden); Hesse, Swen [University of Leipzig, Department of Nuclear Medicine, Leipzig (Germany); Leipzig University Medical Centre, Molecular Neuroimaging IFB Adiposity Diseases, Leipzig (Germany); Borght, Thierry Vander [Universite Catholique de Louvain, Nuclear Medicine Division, CHU Dinant Godinne, Yvoir (Belgium); Laere, Koen van [University Hospital and K.U. Leuven, Nuclear Medicine, Leuven (Belgium); Tatsch, Klaus [Staedtisches Klinikum Karlsruhe, Department of Nuclear Medicine, Karlsruhe (Germany); La Fougere, Christian [University of Munich, Department of Nuclear Medicine, Munich (Germany); University of Tuebingen, Department of Nuclear Medicine, Tuebingen (Germany)

    2014-10-15

    Apart from binding to the dopamine transporter (DAT), [{sup 123}I]FP-CIT shows moderate affinity for the serotonin transporter (SERT), allowing imaging of both monoamine transporters in a single imaging session in different brain areas. The aim of this study was to systematically evaluate extrastriatal binding (predominantly due to SERT) and its age and gender dependencies in a large cohort of healthy controls. SPECT data from 103 healthy controls with well-defined criteria of normality acquired at 13 different imaging centres were analysed for extrastriatal binding using volumes of interest analysis for the thalamus and the pons. Data were examined for gender and age effects as well as for potential influence of striatal DAT radiotracer binding. Thalamic binding was significantly higher than pons binding. Partial correlations showed an influence of putaminal DAT binding on measured binding in the thalamus but not on the pons. Data showed high interindividual variation in extrastriatal binding. Significant gender effects with 31 % higher binding in women than in men were observed in the thalamus, but not in the pons. An age dependency with a decline per decade (±standard error) of 8.2 ± 1.3 % for the thalamus and 6.8 ± 2.9 % for the pons was shown. The potential to evaluate extrastriatal predominant SERT binding in addition to the striatal DAT in a single imaging session was shown using a large database of [{sup 123}I]FP-CIT scans in healthy controls. For both the thalamus and the pons, an age-related decline in radiotracer binding was observed. Gender effects were demonstrated for binding in the thalamus only. As a potential clinical application, the data could be used as a reference to estimate SERT occupancy in addition to nigrostriatal integrity when using [{sup 123}I]FP-CIT for DAT imaging in patients treated with selective serotonin reuptake inhibitors. (orig.)

  4. Preliminary assessment of extrastriatal dopamine d-2 receptor binding in the rodent and nonhuman primate brains using the high affinity radioligand, {sup 18}F-fallypride

    Energy Technology Data Exchange (ETDEWEB)

    Mukherjee, Jogeshwar E-mail: jogeshwar-mukherjee@ketthealth.com; Yang, Z.-Y.; Brown, Terry; Lew, Robert; Wernick, Miles; Ouyang Xiaohu; Yasillo, Nicholas; Chen, C.-T.; Mintzer, Robert; Cooper, Malcolm

    1999-07-01

    We have identified the value of {sup 18}F-fallypride {l_brace}(S)-N-[(1-allyl-2-pyrrolidinyl)methyl]-5-(3-[{sup 18}F]fluoropropyl)-2,3-dim= ethoxybenzamide{r_brace}, as a dopamine D-2 receptor radiotracer for the study of striatal and extrastriatal receptors. Fallypride exhibits high affinities for D-2 and D-3 subtypes and low affinity for D-4 ({sup 3}H-spiperone IC{sub 50}s: D-2=0.05 nM [rat striata], D-3=0.30 nM [SF9 cell lines, rat recombinant], and D-4=240 nM [CHO cell lines, human recombinant]). Biodistribution in the rat brain showed localization of {sup 18}F-fallypride in striata and extrastriatal regions such as the frontal cortex, parietal cortex, amygdala, hippocampus, thalamus, and hypothalamus. In vitro autoradiographic studies in sagittal slices of the rat brain showed localization of {sup 18}F-fallypride in striatal and several extrastriatal regions, including the medulla. Positron emission tomography (PET) experiments with {sup 18}F-fallypride in male rhesus monkeys were carried out in a PET VI scanner. In several PET experiments, apart from the specific binding seen in the striatum, specific binding of {sup 18}F-fallypride was also identified in extracellular regions (in a lower brain slice, possibly the thalamus). Specific binding in the extrastriata was, however, significantly lower compared with that observed in the striata of the monkeys (extrastriata/cerebellum = 2, striata/cerebellum = 10). Postmortem analysis of the monkey brain revealed significant {sup 18}F-fallypride binding in the striata, whereas binding was also observed in extrastriatal regions such as the thalamus, cortical areas, and brain stem.

  5. Gene prediction validation and functional analysis of redundant pathways

    DEFF Research Database (Denmark)

    Sønderkær, Mads

    2011-01-01

    have employed a large mRNA-seq data set to improve and validate ab initio predicted gene models. This direct experimental evidence also provides reliable determinations of UTR regions and polyadenylation sites, which are not easily predicted in plants. Furthermore, once an annotated genome sequence...... is available, gene expression by mRNA-Seq enables acquisition of a more complete overview of gene isoform usage in complex enzymatic pathways enabling the identification of key genes. Metabolism in potatoes This information is useful e.g. for crop improvement based on manipulation of agronomically important...

  6. A network approach to predict pathogenic genes for Fusarium graminearum.

    Science.gov (United States)

    Liu, Xiaoping; Tang, Wei-Hua; Zhao, Xing-Ming; Chen, Luonan

    2010-10-04

    Fusarium graminearum is the pathogenic agent of Fusarium head blight (FHB), which is a destructive disease on wheat and barley, thereby causing huge economic loss and health problems to human by contaminating foods. Identifying pathogenic genes can shed light on pathogenesis underlying the interaction between F. graminearum and its plant host. However, it is difficult to detect pathogenic genes for this destructive pathogen by time-consuming and expensive molecular biological experiments in lab. On the other hand, computational methods provide an alternative way to solve this problem. Since pathogenesis is a complicated procedure that involves complex regulations and interactions, the molecular interaction network of F. graminearum can give clues to potential pathogenic genes. Furthermore, the gene expression data of F. graminearum before and after its invasion into plant host can also provide useful information. In this paper, a novel systems biology approach is presented to predict pathogenic genes of F. graminearum based on molecular interaction network and gene expression data. With a small number of known pathogenic genes as seed genes, a subnetwork that consists of potential pathogenic genes is identified from the protein-protein interaction network (PPIN) of F. graminearum, where the genes in the subnetwork are further required to be differentially expressed before and after the invasion of the pathogenic fungus. Therefore, the candidate genes in the subnetwork are expected to be involved in the same biological processes as seed genes, which imply that they are potential pathogenic genes. The prediction results show that most of the pathogenic genes of F. graminearum are enriched in two important signal transduction pathways, including G protein coupled receptor pathway and MAPK signaling pathway, which are known related to pathogenesis in other fungi. In addition, several pathogenic genes predicted by our method are verified in other pathogenic fungi, which

  7. Reranking candidate gene models with cross-species comparison for improved gene prediction

    Directory of Open Access Journals (Sweden)

    Pereira Fernando CN

    2008-10-01

    Full Text Available Abstract Background Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc. Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models.

  8. Neural Inductive Matrix Completion for Predicting Disease-Gene Associations

    KAUST Repository

    Hou, Siqing

    2018-05-21

    In silico prioritization of undiscovered associations can help find causal genes of newly discovered diseases. Some existing methods are based on known associations, and side information of diseases and genes. We exploit the possibility of using a neural network model, Neural inductive matrix completion (NIMC), in disease-gene prediction. Comparing to the state-of-the-art inductive matrix completion method, using neural networks allows us to learn latent features from non-linear functions of input features. Previous methods use disease features only from mining text. Comparing to text mining, disease ontology is a more informative way of discovering correlation of dis- eases, from which we can calculate the similarities between diseases and help increase the performance of predicting disease-gene associations. We compare the proposed method with other state-of-the-art methods for pre- dicting associated genes for diseases from the Online Mendelian Inheritance in Man (OMIM) database. Results show that both new features and the proposed NIMC model can improve the chance of recovering an unknown associated gene in the top 100 predicted genes. Best results are obtained by using both the new features and the new model. Results also show the proposed method does better in predicting associated genes for newly discovered diseases.

  9. Semi-supervised prediction of gene regulatory networks using ...

    Indian Academy of Sciences (India)

    2015-09-28

    Sep 28, 2015 ... Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging ... two types of methods differ primarily based on whether ..... negligible, allowing us to draw the qualitative conclusions .... research will be conducted to develop additional biologically.

  10. Embryo quality predictive models based on cumulus cells gene expression

    Directory of Open Access Journals (Sweden)

    Devjak R

    2016-06-01

    Full Text Available Since the introduction of in vitro fertilization (IVF in clinical practice of infertility treatment, the indicators for high quality embryos were investigated. Cumulus cells (CC have a specific gene expression profile according to the developmental potential of the oocyte they are surrounding, and therefore, specific gene expression could be used as a biomarker. The aim of our study was to combine more than one biomarker to observe improvement in prediction value of embryo development. In this study, 58 CC samples from 17 IVF patients were analyzed. This study was approved by the Republic of Slovenia National Medical Ethics Committee. Gene expression analysis [quantitative real time polymerase chain reaction (qPCR] for five genes, analyzed according to embryo quality level, was performed. Two prediction models were tested for embryo quality prediction: a binary logistic and a decision tree model. As the main outcome, gene expression levels for five genes were taken and the area under the curve (AUC for two prediction models were calculated. Among tested genes, AMHR2 and LIF showed significant expression difference between high quality and low quality embryos. These two genes were used for the construction of two prediction models: the binary logistic model yielded an AUC of 0.72 ± 0.08 and the decision tree model yielded an AUC of 0.73 ± 0.03. Two different prediction models yielded similar predictive power to differentiate high and low quality embryos. In terms of eventual clinical decision making, the decision tree model resulted in easy-to-interpret rules that are highly applicable in clinical practice.

  11. Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning.

    Science.gov (United States)

    He, Zhili; Zhang, Ping; Wu, Linwei; Rocha, Andrea M; Tu, Qichao; Shi, Zhou; Wu, Bo; Qin, Yujia; Wang, Jianjun; Yan, Qingyun; Curtis, Daniel; Ning, Daliang; Van Nostrand, Joy D; Wu, Liyou; Yang, Yunfeng; Elias, Dwayne A; Watson, David B; Adams, Michael W W; Fields, Matthew W; Alm, Eric J; Hazen, Terry C; Adams, Paul D; Arkin, Adam P; Zhou, Jizhong

    2018-02-20

    Contamination from anthropogenic activities has significantly impacted Earth's biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN), representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate) increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate) increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5) increased significantly ( P contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning. IMPORTANCE Disentangling the relationships between biodiversity and ecosystem functioning is an important but poorly understood topic in ecology. Predicting ecosystem functioning on the basis of biodiversity is even more difficult, particularly with microbial biomarkers. As an exploratory effort, this study used key microbial functional genes as biomarkers to provide predictive understanding of environmental contamination and ecosystem functioning. The results indicated that the overall functional gene richness/diversity decreased as uranium increased in groundwater, while specific key microbial guilds increased significantly as

  12. The effects of d-amphetamine on extrastriatal dopamine D{sub 2}/D{sub 3} receptors: a randomized, double-blind, placebo-controlled PET study with [{sup 11}C]FLB 457 in healthy subjects

    Energy Technology Data Exchange (ETDEWEB)

    Aalto, Sargo [University of Turku, Turku PET Centre, Turku (Finland); Aabo Akademi University, Department of Psychology, Turku (Finland); Hirvonen, Jussi; Kajander, Jaana; Naagren, Kjell; Rinne, Juha O. [University of Turku, Turku PET Centre, Turku (Finland); Kaasinen, Valtteri [University of Turku, Department of Neurology, P.O. Box 52, Turku (Finland); Hagelberg, Nora [University of Turku, Turku PET Centre, Turku (Finland); Turku University Central Hospital, Department of Anaesthesiology, Intensive Care, Emergency Care and Pain Medicine, Turku (Finland); Seppaelae, Timo [Drug Research Unit, National Public Health Institute, Helsinki (Finland); Scheinin, Harry [University of Turku, Turku PET Centre, Turku (Finland); University of Turku, Department of Pharmacology, Drug Development and Therapeutics, Turku (Finland); Hietala, Jarmo [University of Turku, Turku PET Centre, Turku (Finland); University of Turku, Department of Psychiatry, Turku (Finland)

    2009-03-15

    The dopamine D{sub 2}/D{sub 3} receptor ligand [{sup 11}C]FLB 457 and PET enable quantification of low-density extrastriatal D{sub 2}/D{sub 3} receptors, but it is uncertain whether [{sup 11}C]FLB 457 can be used for measuring extrastriatal dopamine release. We studied the effects of d-amphetamine (0.3 mg/kg i.v.) on extrastriatal [{sup 11}C]FLB 457 binding potential (BP{sub ND}) in a randomized, double-blind, placebo-controlled study including 24 healthy volunteers. The effects of d-amphetamine on [{sup 11}C]FLB 457 BP{sub ND} and distribution volume (V{sub T}) in the frontal cortex were not different from those of placebo. Small decreases in [{sup 11}C]FLB 457 BP{sub ND} were observed only in the posterior cingulate and hippocampus. The regional changes in [{sup 11}C]FLB 457 BP{sub ND} did not correlate with d-amphetamine-induced changes in subjective ratings of euphoria. This placebo-controlled study showed that d-amphetamine does not induce marked changes in measures of extrastriatal dopamine D{sub 2}/D{sub 3} receptor binding. Our results indicate that [{sup 11}C]FLB 457 PET is not a useful method for measuring extrastriatal dopamine release in humans. (orig.)

  13. Predictability of Genetic Interactions from Functional Gene Modules

    Directory of Open Access Journals (Sweden)

    Jonathan H. Young

    2017-02-01

    Full Text Available Characterizing genetic interactions is crucial to understanding cellular and organismal response to gene-level perturbations. Such knowledge can inform the selection of candidate disease therapy targets, yet experimentally determining whether genes interact is technically nontrivial and time-consuming. High-fidelity prediction of different classes of genetic interactions in multiple organisms would substantially alleviate this experimental burden. Under the hypothesis that functionally related genes tend to share common genetic interaction partners, we evaluate a computational approach to predict genetic interactions in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae. By leveraging knowledge of functional relationships between genes, we cross-validate predictions on known genetic interactions and observe high predictive power of multiple classes of genetic interactions in all three organisms. Additionally, our method suggests high-confidence candidate interaction pairs that can be directly experimentally tested. A web application is provided for users to query genes for predicted novel genetic interaction partners. Finally, by subsampling the known yeast genetic interaction network, we found that novel genetic interactions are predictable even when knowledge of currently known interactions is minimal.

  14. Prediction of regulatory gene pairs using dynamic time warping and gene ontology.

    Science.gov (United States)

    Yang, Andy C; Hsu, Hui-Huang; Lu, Ming-Da; Tseng, Vincent S; Shih, Timothy K

    2014-01-01

    Selecting informative genes is the most important task for data analysis on microarray gene expression data. In this work, we aim at identifying regulatory gene pairs from microarray gene expression data. However, microarray data often contain multiple missing expression values. Missing value imputation is thus needed before further processing for regulatory gene pairs becomes possible. We develop a novel approach to first impute missing values in microarray time series data by combining k-Nearest Neighbour (KNN), Dynamic Time Warping (DTW) and Gene Ontology (GO). After missing values are imputed, we then perform gene regulation prediction based on our proposed DTW-GO distance measurement of gene pairs. Experimental results show that our approach is more accurate when compared with existing missing value imputation methods on real microarray data sets. Furthermore, our approach can also discover more regulatory gene pairs that are known in the literature than other methods.

  15. Global discriminative learning for higher-accuracy computational gene prediction.

    Directory of Open Access Journals (Sweden)

    Axel Bernal

    2007-03-01

    Full Text Available Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns.

  16. Random Subspace Aggregation for Cancer Prediction with Gene Expression Profiles

    Directory of Open Access Journals (Sweden)

    Liying Yang

    2016-01-01

    Full Text Available Background. Precisely predicting cancer is crucial for cancer treatment. Gene expression profiles make it possible to analyze patterns between genes and cancers on the genome-wide scale. Gene expression data analysis, however, is confronted with enormous challenges for its characteristics, such as high dimensionality, small sample size, and low Signal-to-Noise Ratio. Results. This paper proposes a method, termed RS_SVM, to predict gene expression profiles via aggregating SVM trained on random subspaces. After choosing gene features through statistical analysis, RS_SVM randomly selects feature subsets to yield random subspaces and training SVM classifiers accordingly and then aggregates SVM classifiers to capture the advantage of ensemble learning. Experiments on eight real gene expression datasets are performed to validate the RS_SVM method. Experimental results show that RS_SVM achieved better classification accuracy and generalization performance in contrast with single SVM, K-nearest neighbor, decision tree, Bagging, AdaBoost, and the state-of-the-art methods. Experiments also explored the effect of subspace size on prediction performance. Conclusions. The proposed RS_SVM method yielded superior performance in analyzing gene expression profiles, which demonstrates that RS_SVM provides a good channel for such biological data.

  17. A deep auto-encoder model for gene expression prediction.

    Science.gov (United States)

    Xie, Rui; Wen, Jia; Quitadamo, Andrew; Cheng, Jianlin; Shi, Xinghua

    2017-11-17

    Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes' contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.

  18. Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models.

    Science.gov (United States)

    Mahony, Shaun; McInerney, James O; Smith, Terry J; Golden, Aaron

    2004-03-05

    Many current gene prediction methods use only one model to represent protein-coding regions in a genome, and so are less likely to predict the location of genes that have an atypical sequence composition. It is likely that future improvements in gene finding will involve the development of methods that can adequately deal with intra-genomic compositional variation. This work explores a new approach to gene-prediction, based on the Self-Organizing Map, which has the ability to automatically identify multiple gene models within a genome. The current implementation, named RescueNet, uses relative synonymous codon usage as the indicator of protein-coding potential. While its raw accuracy rate can be less than other methods, RescueNet consistently identifies some genes that other methods do not, and should therefore be of interest to gene-prediction software developers and genome annotation teams alike. RescueNet is recommended for use in conjunction with, or as a complement to, other gene prediction methods.

  19. Combining Gene Signatures Improves Prediction of Breast Cancer Survival

    Science.gov (United States)

    Zhao, Xi; Naume, Bjørn; Langerød, Anita; Frigessi, Arnoldo; Kristensen, Vessela N.; Børresen-Dale, Anne-Lise; Lingjærde, Ole Christian

    2011-01-01

    Background Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study. Principal Findings To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. Conclusion Combining the predictive strength of multiple gene signatures improves prediction of breast

  20. Combining gene signatures improves prediction of breast cancer survival.

    Directory of Open Access Journals (Sweden)

    Xi Zhao

    Full Text Available BACKGROUND: Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123 and test set (n = 81, respectively. Gene sets from eleven previously published gene signatures are included in the study. PRINCIPAL FINDINGS: To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014. Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001. The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. CONCLUSION: Combining the predictive strength of multiple gene signatures improves

  1. Bioinformatic prediction and functional characterization of human KIAA0100 gene

    Directory of Open Access Journals (Sweden)

    He Cui

    2017-02-01

    Full Text Available Our previous study demonstrated that human KIAA0100 gene was a novel acute monocytic leukemia-associated antigen (MLAA gene. But the functional characterization of human KIAA0100 gene has remained unknown to date. Here, firstly, bioinformatic prediction of human KIAA0100 gene was carried out using online softwares; Secondly, Human KIAA0100 gene expression was downregulated by the clustered regularly interspaced short palindromic repeats (CRISPR/CRISPR-associated (Cas 9 system in U937 cells. Cell proliferation and apoptosis were next evaluated in KIAA0100-knockdown U937 cells. The bioinformatic prediction showed that human KIAA0100 gene was located on 17q11.2, and human KIAA0100 protein was located in the secretory pathway. Besides, human KIAA0100 protein contained a signalpeptide, a transmembrane region, three types of secondary structures (alpha helix, extended strand, and random coil , and four domains from mitochondrial protein 27 (FMP27. The observation on functional characterization of human KIAA0100 gene revealed that its downregulation inhibited cell proliferation, and promoted cell apoptosis in U937 cells. To summarize, these results suggest human KIAA0100 gene possibly comes within mitochondrial genome; moreover, it is a novel anti-apoptotic factor related to carcinogenesis or progression in acute monocytic leukemia, and may be a potential target for immunotherapy against acute monocytic leukemia.

  2. A network approach to predict pathogenic genes for Fusarium graminearum.

    Directory of Open Access Journals (Sweden)

    Xiaoping Liu

    Full Text Available Fusarium graminearum is the pathogenic agent of Fusarium head blight (FHB, which is a destructive disease on wheat and barley, thereby causing huge economic loss and health problems to human by contaminating foods. Identifying pathogenic genes can shed light on pathogenesis underlying the interaction between F. graminearum and its plant host. However, it is difficult to detect pathogenic genes for this destructive pathogen by time-consuming and expensive molecular biological experiments in lab. On the other hand, computational methods provide an alternative way to solve this problem. Since pathogenesis is a complicated procedure that involves complex regulations and interactions, the molecular interaction network of F. graminearum can give clues to potential pathogenic genes. Furthermore, the gene expression data of F. graminearum before and after its invasion into plant host can also provide useful information. In this paper, a novel systems biology approach is presented to predict pathogenic genes of F. graminearum based on molecular interaction network and gene expression data. With a small number of known pathogenic genes as seed genes, a subnetwork that consists of potential pathogenic genes is identified from the protein-protein interaction network (PPIN of F. graminearum, where the genes in the subnetwork are further required to be differentially expressed before and after the invasion of the pathogenic fungus. Therefore, the candidate genes in the subnetwork are expected to be involved in the same biological processes as seed genes, which imply that they are potential pathogenic genes. The prediction results show that most of the pathogenic genes of F. graminearum are enriched in two important signal transduction pathways, including G protein coupled receptor pathway and MAPK signaling pathway, which are known related to pathogenesis in other fungi. In addition, several pathogenic genes predicted by our method are verified in other

  3. Gene Prediction in Metagenomic Fragments with Deep Learning

    Directory of Open Access Journals (Sweden)

    Shao-Wu Zhang

    2017-01-01

    Full Text Available Next generation sequencing technologies used in metagenomics yield numerous sequencing fragments which come from thousands of different species. Accurately identifying genes from metagenomics fragments is one of the most fundamental issues in metagenomics. In this article, by fusing multifeatures (i.e., monocodon usage, monoamino acid usage, ORF length coverage, and Z-curve features and using deep stacking networks learning model, we present a novel method (called Meta-MFDL to predict the metagenomic genes. The results with 10 CV and independent tests show that Meta-MFDL is a powerful tool for identifying genes from metagenomic fragments.

  4. Gene-specific function prediction for non-synonymous mutations in monogenic diabetes genes.

    Directory of Open Access Journals (Sweden)

    Quan Li

    Full Text Available The rapid progress of genomic technologies has been providing new opportunities to address the need of maturity-onset diabetes of the young (MODY molecular diagnosis. However, whether a new mutation causes MODY can be questionable. A number of in silico methods have been developed to predict functional effects of rare human mutations. The purpose of this study is to compare the performance of different bioinformatics methods in the functional prediction of nonsynonymous mutations in each MODY gene, and provides reference matrices to assist the molecular diagnosis of MODY. Our study showed that the prediction scores by different methods of the diabetes mutations were highly correlated, but were more complimentary than replacement to each other. The available in silico methods for the prediction of diabetes mutations had varied performances across different genes. Applying gene-specific thresholds defined by this study may be able to increase the performance of in silico prediction of disease-causing mutations.

  5. Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data

    Directory of Open Access Journals (Sweden)

    Osamu Komori

    2013-01-01

    Full Text Available This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems due to the unbalance in the number of observed genes compared with the number of observed subjects. We make a reanalysis of microarray gene expression published data to detect many other gene sets with almost the same performance. We conclude in the current stage that it is not possible to extract only informative genes with high performance in the all observed genes. We investigate the reason why this difficulty still exists even though there are actively proposed analysis methods and learning algorithms in statistical machine learning approaches. We focus on the mutual coherence or the absolute value of the Pearson correlations between two genes and describe the distributions of the correlation for the selected set of genes and the total set. We show that the problem of finding informative genes in high dimensional data is ill-posed and that the difficulty is closely related with the mutual coherence.

  6. The prediction of candidate genes for cervix related cancer through gene ontology and graph theoretical approach.

    Science.gov (United States)

    Hindumathi, V; Kranthi, T; Rao, S B; Manimaran, P

    2014-06-01

    With rapidly changing technology, prediction of candidate genes has become an indispensable task in recent years mainly in the field of biological research. The empirical methods for candidate gene prioritization that succors to explore the potential pathway between genetic determinants and complex diseases are highly cumbersome and labor intensive. In such a scenario predicting potential targets for a disease state through in silico approaches are of researcher's interest. The prodigious availability of protein interaction data coupled with gene annotation renders an ease in the accurate determination of disease specific candidate genes. In our work we have prioritized the cervix related cancer candidate genes by employing Csaba Ortutay and his co-workers approach of identifying the candidate genes through graph theoretical centrality measures and gene ontology. With the advantage of the human protein interaction data, cervical cancer gene sets and the ontological terms, we were able to predict 15 novel candidates for cervical carcinogenesis. The disease relevance of the anticipated candidate genes was corroborated through a literature survey. Also the presence of the drugs for these candidates was detected through Therapeutic Target Database (TTD) and DrugMap Central (DMC) which affirms that they may be endowed as potential drug targets for cervical cancer.

  7. Inductive matrix completion for predicting gene-disease associations.

    Science.gov (United States)

    Natarajan, Nagarajan; Dhillon, Inderjit S

    2014-06-15

    Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has bigdata.ices.utexas.edu/project/gene-disease. © The Author 2014. Published by Oxford University Press.

  8. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    International Nuclear Information System (INIS)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-01-01

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society

  9. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    Energy Technology Data Exchange (ETDEWEB)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-09-18

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society.

  10. Prediction of epigenetically regulated genes in breast cancer cell lines

    Energy Technology Data Exchange (ETDEWEB)

    Loss, Leandro A; Sadanandam, Anguraj; Durinck, Steffen; Nautiyal, Shivani; Flaucher, Diane; Carlton, Victoria EH; Moorhead, Martin; Lu, Yontao; Gray, Joe W; Faham, Malek; Spellman, Paul; Parvin, Bahram

    2010-05-04

    panel of breast cancer cell lines. Subnetwork enrichment of these genes has identifed 35 common regulators with 6 or more predicted markers. In addition to identifying epigenetically regulated genes, we show evidence of differentially expressed methylation patterns between the basal and luminal subtypes. Our results indicate that the proposed computational protocol is a viable platform for identifying epigenetically regulated genes. Our protocol has generated a list of predictors including COL1A2, TOP2A, TFF1, and VAV3, genes whose key roles in epigenetic regulation is documented in the literature. Subnetwork enrichment of these predicted markers further suggests that epigenetic regulation of individual genes occurs in a coordinated fashion and through common regulators.

  11. Prediction of human protein function according to Gene Ontology categories

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Gupta, Ramneek; Stærfeldt, Hans Henrik

    2003-01-01

    developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors...

  12. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Science.gov (United States)

    Noar, Roslyn D; Daub, Margaret E

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity) for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity) to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that they may encode

  13. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Directory of Open Access Journals (Sweden)

    Roslyn D Noar

    Full Text Available Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that

  14. Combining many interaction networks to predict gene function and analyze gene lists.

    Science.gov (United States)

    Mostafavi, Sara; Morris, Quaid

    2012-05-01

    In this article, we review how interaction networks can be used alone or in combination in an automated fashion to provide insight into gene and protein function. We describe the concept of a "gene-recommender system" that can be applied to any large collection of interaction networks to make predictions about gene or protein function based on a query list of proteins that share a function of interest. We discuss these systems in general and focus on one specific system, GeneMANIA, that has unique features and uses different algorithms from the majority of other systems. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. The role of gene-gene interaction in the prediction of criminal behavior.

    Science.gov (United States)

    Boutwell, Brian B; Menard, Scott; Barnes, J C; Beaver, Kevin M; Armstrong, Todd A; Boisvert, Danielle

    2014-04-01

    A host of research has examined the possibility that environmental risk factors might condition the influence of genes on various outcomes. Less research, however, has been aimed at exploring the possibility that genetic factors might interact to impact the emergence of human traits. Even fewer studies exist examining the interaction of genes in the prediction of behavioral outcomes. The current study expands this body of research by testing the interaction between genes involved in neural transmission. Our findings suggest that certain dopamine genes interact to increase the odds of criminogenic outcomes in a national sample of Americans. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. GOPET: A tool for automated predictions of Gene Ontology terms

    Directory of Open Access Journals (Sweden)

    Glatting Karl-Heinz

    2006-03-01

    Full Text Available Abstract Background Vast progress in sequencing projects has called for annotation on a large scale. A Number of methods have been developed to address this challenging task. These methods, however, either apply to specific subsets, or their predictions are not formalised, or they do not provide precise confidence values for their predictions. Description We recently established a learning system for automated annotation, trained with a broad variety of different organisms to predict the standardised annotation terms from Gene Ontology (GO. Now, this method has been made available to the public via our web-service GOPET (Gene Ontology term Prediction and Evaluation Tool. It supplies annotation for sequences of any organism. For each predicted term an appropriate confidence value is provided. The basic method had been developed for predicting molecular function GO-terms. It is now expanded to predict biological process terms. This web service is available via http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar Conclusion Our web service gives experimental researchers as well as the bioinformatics community a valuable sequence annotation device. Additionally, GOPET also provides less significant annotation data which may serve as an extended discovery platform for the user.

  17. Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes.

    Science.gov (United States)

    Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan

    2017-10-03

    Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.

  18. Striatal and extra-striatal dopamine transporter in cannabis and tobacco addiction: a high resolution PET study

    International Nuclear Information System (INIS)

    Leroy, C.; Martinot, J.L.; Duchesnay, E.; Artiges, E.; Ribeiro, M.J.; Trichard, Ch.; Karila, L.; Lukasiewicz, M.; Benyamina, A.; Reynaud, M.; Martinot, J.L.; Duchesnay, E.; Artiges, E.; Comtat, C.; Artiges, E.; Trichard, Ch.

    2011-01-01

    The dopamine (DA) system is known to be involved in the reward and dependence mechanisms of addiction. However, modifications in dopaminergic neurotransmission associated with long-term tobacco and cannabis use have been poorly documented in vivo. In order to assess striatal and extra-striatal dopamine transporter (DAT) availability in tobacco and cannabis addiction, three groups of male age-matched subjects were compared: 11 healthy non-smoker subjects, 14 tobacco-dependent smokers (17.6 ± 5.3 cigarettes/day for 12.1 ± 8.5 years) and 13 cannabis and tobacco smokers (CTS) (4.8 ± 5.3 cannabis joints/day for 8.7 ± 3.9 years). DAT availability was examined in positron emission tomography (HRRT) with a high resolution research tomograph after injection of [ 11 C]PE2I, a selective DAT radioligand. Region of interest and voxel-by-voxel approaches using a simplified reference tissue model were performed for the between-group comparison of DAT availability. Measurements in the dorsal striatum from both analyses were concordant and showed a mean 20% lower DAT availability in drug users compared with controls. Whole-brain analysis also revealed lower DAT availability in the ventral striatum, the midbrain, the middle cingulate and the thalamus (ranging from -15 to -30%). The DAT availability was slightly lower in all regions in CTS than in subjects who smoke tobacco only, but the difference does not reach a significant level. These results support the existence of a decrease in DAT availability associated with tobacco and cannabis addictions involving all dopaminergic brain circuits. These findings are consistent with the idea of a global decrease in cerebral DA activity in dependent subjects. (authors)

  19. Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning

    Directory of Open Access Journals (Sweden)

    Zhili He

    2018-02-01

    Full Text Available Contamination from anthropogenic activities has significantly impacted Earth’s biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN, representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5 increased significantly (P < 0.05 as uranium or nitrate increased, and their changes could be used to successfully predict uranium and nitrate contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning.

  20. Microbial Functional Gene Diversity Predicts Groundwater Contamination and Ecosystem Functioning

    Science.gov (United States)

    Zhang, Ping; Wu, Linwei; Rocha, Andrea M.; Shi, Zhou; Wu, Bo; Qin, Yujia; Wang, Jianjun; Yan, Qingyun; Curtis, Daniel; Ning, Daliang; Van Nostrand, Joy D.; Wu, Liyou; Watson, David B.; Adams, Michael W. W.; Alm, Eric J.; Adams, Paul D.; Arkin, Adam P.

    2018-01-01

    ABSTRACT Contamination from anthropogenic activities has significantly impacted Earth’s biosphere. However, knowledge about how environmental contamination affects the biodiversity of groundwater microbiomes and ecosystem functioning remains very limited. Here, we used a comprehensive functional gene array to analyze groundwater microbiomes from 69 wells at the Oak Ridge Field Research Center (Oak Ridge, TN), representing a wide pH range and uranium, nitrate, and other contaminants. We hypothesized that the functional diversity of groundwater microbiomes would decrease as environmental contamination (e.g., uranium or nitrate) increased or at low or high pH, while some specific populations capable of utilizing or resistant to those contaminants would increase, and thus, such key microbial functional genes and/or populations could be used to predict groundwater contamination and ecosystem functioning. Our results indicated that functional richness/diversity decreased as uranium (but not nitrate) increased in groundwater. In addition, about 5.9% of specific key functional populations targeted by a comprehensive functional gene array (GeoChip 5) increased significantly (P contamination and ecosystem functioning. This study indicates great potential for using microbial functional genes to predict environmental contamination and ecosystem functioning. PMID:29463661

  1. Predictive modelling of gene expression from transcriptional regulatory elements.

    Science.gov (United States)

    Budden, David M; Hurley, Daniel G; Crampin, Edmund J

    2015-07-01

    Predictive modelling of gene expression provides a powerful framework for exploring the regulatory logic underpinning transcriptional regulation. Recent studies have demonstrated the utility of such models in identifying dysregulation of gene and miRNA expression associated with abnormal patterns of transcription factor (TF) binding or nucleosomal histone modifications (HMs). Despite the growing popularity of such approaches, a comparative review of the various modelling algorithms and feature extraction methods is lacking. We define and compare three methods of quantifying pairwise gene-TF/HM interactions and discuss their suitability for integrating the heterogeneous chromatin immunoprecipitation (ChIP)-seq binding patterns exhibited by TFs and HMs. We then construct log-linear and ϵ-support vector regression models from various mouse embryonic stem cell (mESC) and human lymphoblastoid (GM12878) data sets, considering both ChIP-seq- and position weight matrix- (PWM)-derived in silico TF-binding. The two algorithms are evaluated both in terms of their modelling prediction accuracy and ability to identify the established regulatory roles of individual TFs and HMs. Our results demonstrate that TF-binding and HMs are highly predictive of gene expression as measured by mRNA transcript abundance, irrespective of algorithm or cell type selection and considering both ChIP-seq and PWM-derived TF-binding. As we encourage other researchers to explore and develop these results, our framework is implemented using open-source software and made available as a preconfigured bootable virtual environment. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  2. Identification of rat genes by TWINSCAN gene prediction, RT-PCR, and direct sequencing

    DEFF Research Database (Denmark)

    Wu, Jia Qian; Shteynberg, David; Arumugam, Manimozhiyan

    2004-01-01

    an alternative approach: reverse transcription-polymerase chain reaction (RT-PCR) and direct sequencing based on dual-genome de novo predictions from TWINSCAN. We tested 444 TWINSCAN-predicted rat genes that showed significant homology to known human genes implicated in disease but that were partially...... in the single-intron experiment. Spliced sequences were amplified in 46 cases (34%). We conclude that this procedure for elucidating gene structures with native cDNA sequences is cost-effective and will become even more so as it is further optimized.......The publication of a draft sequence of a third mammalian genome--that of the rat--suggests a need to rethink genome annotation. New mammalian sequences will not receive the kind of labor-intensive annotation efforts that are currently being devoted to human. In this paper, we demonstrate...

  3. Dinucleotide controlled null models for comparative RNA gene prediction

    Directory of Open Access Journals (Sweden)

    Gesell Tanja

    2008-05-01

    Full Text Available Abstract Background Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. Results We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. Conclusion SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require

  4. Dinucleotide controlled null models for comparative RNA gene prediction.

    Science.gov (United States)

    Gesell, Tanja; Washietl, Stefan

    2008-05-27

    Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. SISSIz

  5. Identification of a robust gene signature that predicts breast cancer outcome in independent data sets

    International Nuclear Information System (INIS)

    Korkola, James E; Waldman, Frederic M; Blaveri, Ekaterina; DeVries, Sandy; Moore, Dan H II; Hwang, E Shelley; Chen, Yunn-Yi; Estep, Anne LH; Chew, Karen L; Jensen, Ronald H

    2007-01-01

    Breast cancer is a heterogeneous disease, presenting with a wide range of histologic, clinical, and genetic features. Microarray technology has shown promise in predicting outcome in these patients. We profiled 162 breast tumors using expression microarrays to stratify tumors based on gene expression. A subset of 55 tumors with extensive follow-up was used to identify gene sets that predicted outcome. The predictive gene set was further tested in previously published data sets. We used different statistical methods to identify three gene sets associated with disease free survival. A fourth gene set, consisting of 21 genes in common to all three sets, also had the ability to predict patient outcome. To validate the predictive utility of this derived gene set, it was tested in two published data sets from other groups. This gene set resulted in significant separation of patients on the basis of survival in these data sets, correctly predicting outcome in 62–65% of patients. By comparing outcome prediction within subgroups based on ER status, grade, and nodal status, we found that our gene set was most effective in predicting outcome in ER positive and node negative tumors. This robust gene selection with extensive validation has identified a predictive gene set that may have clinical utility for outcome prediction in breast cancer patients

  6. Predictive gene testing for Huntington disease and other neurodegenerative disorders.

    Science.gov (United States)

    Wedderburn, S; Panegyres, P K; Andrew, S; Goldblatt, J; Liebeck, T; McGrath, F; Wiltshire, M; Pestell, C; Lee, J; Beilby, J

    2013-12-01

    Controversies exist around predictive testing (PT) programmes in neurodegenerative disorders. This study sets out to answer the following questions relating to Huntington disease (HD) and other neurodegenerative disorders: differences between these patients in their PT journeys, why and when individuals withdraw from PT, and decision-making processes regarding reproductive genetic testing. A case series analysis of patients having PT from the multidisciplinary Western Australian centre for PT over the past 20 years was performed using internationally recognised guidelines for predictive gene testing in neurodegenerative disorders. Of 740 at-risk patients, 518 applied for PT: 466 at risk of HD, 52 at risk of other neurodegenerative disorders - spinocerebellar ataxias, hereditary prion disease and familial Alzheimer disease. Thirteen percent withdrew from PT - 80.32% of withdrawals occurred during counselling stages. Major withdrawal reasons related to timing in the patients' lives or unknown as the patient did not disclose the reason. Thirty-eight HD individuals had reproductive genetic testing: 34 initiated prenatal testing (of which eight withdrew from the process) and four initiated pre-implantation genetic diagnosis. There was no recorded or other evidence of major psychological reactions or suicides during PT. People withdrew from PT in relation to life stages and reasons that are unknown. Our findings emphasise the importance of: (i) adherence to internationally recommended guidelines for PT; (ii) the role of the multidisciplinary team in risk minimisation; and (iii) patient selection. © 2013 The Authors; Internal Medicine Journal © 2013 Royal Australasian College of Physicians.

  7. Network-based prediction and knowledge mining of disease genes.

    Science.gov (United States)

    Carson, Matthew B; Lu, Hui

    2015-01-01

    In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second-order neighbors in the PPI network

  8. MOCAT: a metagenomics assembly and gene prediction toolkit.

    Science.gov (United States)

    Kultima, Jens Roat; Sunagawa, Shinichi; Li, Junhua; Chen, Weineng; Chen, Hua; Mende, Daniel R; Arumugam, Manimozhiyan; Pan, Qi; Liu, Binghang; Qin, Junjie; Wang, Jun; Bork, Peer

    2012-01-01

    MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.

  9. MOCAT: a metagenomics assembly and gene prediction toolkit.

    Directory of Open Access Journals (Sweden)

    Jens Roat Kultima

    Full Text Available MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.

  10. A genome-wide gene function prediction resource for Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Han Yan

    2010-08-01

    Full Text Available Predicting gene functions by integrating large-scale biological data remains a challenge for systems biology. Here we present a resource for Drosophila melanogaster gene function predictions. We trained function-specific classifiers to optimize the influence of different biological datasets for each functional category. Our model predicted GO terms and KEGG pathway memberships for Drosophila melanogaster genes with high accuracy, as affirmed by cross-validation, supporting literature evidence, and large-scale RNAi screens. The resulting resource of prioritized associations between Drosophila genes and their potential functions offers a guide for experimental investigations.

  11. Identifying the Gene Signatures from Gene-Pathway Bipartite Network Guarantees the Robust Model Performance on Predicting the Cancer Prognosis

    Directory of Open Access Journals (Sweden)

    Li He

    2014-01-01

    Full Text Available For the purpose of improving the prediction of cancer prognosis in the clinical researches, various algorithms have been developed to construct the predictive models with the gene signatures detected by DNA microarrays. Due to the heterogeneity of the clinical samples, the list of differentially expressed genes (DEGs generated by the statistical methods or the machine learning algorithms often involves a number of false positive genes, which are not associated with the phenotypic differences between the compared clinical conditions, and subsequently impacts the reliability of the predictive models. In this study, we proposed a strategy, which combined the statistical algorithm with the gene-pathway bipartite networks, to generate the reliable lists of cancer-related DEGs and constructed the models by using support vector machine for predicting the prognosis of three types of cancers, namely, breast cancer, acute myeloma leukemia, and glioblastoma. Our results demonstrated that, combined with the gene-pathway bipartite networks, our proposed strategy can efficiently generate the reliable cancer-related DEG lists for constructing the predictive models. In addition, the model performance in the swap analysis was similar to that in the original analysis, indicating the robustness of the models in predicting the cancer outcomes.

  12. Benchmarking of gene prediction programs for metagenomic data.

    Science.gov (United States)

    Yok, Non; Rosen, Gail

    2010-01-01

    This manuscript presents the most rigorous benchmarking of gene annotation algorithms for metagenomic datasets to date. We compare three different programs: GeneMark, MetaGeneAnnotator (MGA) and Orphelia. The comparisons are based on their performances over simulated fragments from one hundred species of diverse lineages. We defined four different types of fragments; two types come from the inter- and intra-coding regions and the other types are from the gene edges. Hoff et al. used only 12 species in their comparison; therefore, their sample is too small to represent an environmental sample. Also, no predecessors has separately examined fragments that contain gene edges as opposed to intra-coding regions. General observations in our results are that performances of all these programs improve as we increase the length of the fragment. On the other hand, intra-coding fragments of our data show low annotation error in all of the programs if compared to the gene edge fragments. Overall, we found an upper-bound performance by combining all the methods.

  13. A Gene Expression Profile of BRCAness That Predicts for Responsiveness to Platinum and PARP Inhibitors

    Science.gov (United States)

    2017-02-01

    affecting the function of Fanconi Anemia (FA) genes ( FANCA /B/C/D2/E/F/G/I/J/L/M, PALB2) or DNA damage response genes involved in HR 5 (ATM, ATR...Award Number: W81XWH-10-1-0585 TITLE: A Gene Expression Profile of BRCAness That Predicts for Responsiveness to Platinum and PARP Inhibitors...To) 15 July 2010 – 2 Nov.2016 4. TITLE AND SUBTITLE A Gene Expression Profile of BRCAness That Predicts for Responsiveness to Platinum and PARP

  14. Genome-wide prediction and analysis of human tissue-selective genes using microarray expression data

    Directory of Open Access Journals (Sweden)

    Teng Shaolei

    2013-01-01

    Full Text Available Abstract Background Understanding how genes are expressed specifically in particular tissues is a fundamental question in developmental biology. Many tissue-specific genes are involved in the pathogenesis of complex human diseases. However, experimental identification of tissue-specific genes is time consuming and difficult. The accurate predictions of tissue-specific gene targets could provide useful information for biomarker development and drug target identification. Results In this study, we have developed a machine learning approach for predicting the human tissue-specific genes using microarray expression data. The lists of known tissue-specific genes for different tissues were collected from UniProt database, and the expression data retrieved from the previously compiled dataset according to the lists were used for input vector encoding. Random Forests (RFs and Support Vector Machines (SVMs were used to construct accurate classifiers. The RF classifiers were found to outperform SVM models for tissue-specific gene prediction. The results suggest that the candidate genes for brain or liver specific expression can provide valuable information for further experimental studies. Our approach was also applied for identifying tissue-selective gene targets for different types of tissues. Conclusions A machine learning approach has been developed for accurately identifying the candidate genes for tissue specific/selective expression. The approach provides an efficient way to select some interesting genes for developing new biomedical markers and improve our knowledge of tissue-specific expression.

  15. Predictive value of MSH2 gene expression in colorectal cancer treated with capecitabine

    DEFF Research Database (Denmark)

    Jensen, Lars H; Danenberg, Kathleen D; Danenberg, Peter V

    2007-01-01

    was associated with a hazard ratio of 0.5 (95% confidence interval, 0.23-1.11; P = 0.083) in survival analysis. CONCLUSION: The higher gene expression of MSH2 in responders and the trend for predicting overall survival indicates a predictive value of this marker in the treatment of advanced CRC with capecitabine.......PURPOSE: The objective of the present study was to evaluate the gene expression of the DNA mismatch repair gene MSH2 as a predictive marker in advanced colorectal cancer (CRC) treated with first-line capecitabine. PATIENTS AND METHODS: Microdissection of paraffin-embedded tumor tissue, RNA...

  16. Testing the predictive value of peripheral gene expression for nonremission following citalopram treatment for major depression.

    Science.gov (United States)

    Guilloux, Jean-Philippe; Bassi, Sabrina; Ding, Ying; Walsh, Chris; Turecki, Gustavo; Tseng, George; Cyranowski, Jill M; Sibille, Etienne

    2015-02-01

    Major depressive disorder (MDD) in general, and anxious-depression in particular, are characterized by poor rates of remission with first-line treatments, contributing to the chronic illness burden suffered by many patients. Prospective research is needed to identify the biomarkers predicting nonremission prior to treatment initiation. We collected blood samples from a discovery cohort of 34 adult MDD patients with co-occurring anxiety and 33 matched, nondepressed controls at baseline and after 12 weeks (of citalopram plus psychotherapy treatment for the depressed cohort). Samples were processed on gene arrays and group differences in gene expression were investigated. Exploratory analyses suggest that at pretreatment baseline, nonremitting patients differ from controls with gene function and transcription factor analyses potentially related to elevated inflammation and immune activation. In a second phase, we applied an unbiased machine learning prediction model and corrected for model-selection bias. Results show that baseline gene expression predicted nonremission with 79.4% corrected accuracy with a 13-gene model. The same gene-only model predicted nonremission after 8 weeks of citalopram treatment with 76% corrected accuracy in an independent validation cohort of 63 MDD patients treated with citalopram at another institution. Together, these results demonstrate the potential, but also the limitations, of baseline peripheral blood-based gene expression to predict nonremission after citalopram treatment. These results not only support their use in future prediction tools but also suggest that increased accuracy may be obtained with the inclusion of additional predictors (eg, genetics and clinical scales).

  17. Exploring gene expression signatures for predicting disease free survival after resection of colorectal cancer liver metastases.

    Directory of Open Access Journals (Sweden)

    Nikol Snoeren

    Full Text Available BACKGROUND AND OBJECTIVES: This study was designed to identify and validate gene signatures that can predict disease free survival (DFS in patients undergoing a radical resection for their colorectal liver metastases (CRLM. METHODS: Tumor gene expression profiles were collected from 119 patients undergoing surgery for their CRLM in the Paul Brousse Hospital (France and the University Medical Center Utrecht (The Netherlands. Patients were divided into high and low risk groups. A randomly selected training set was used to find predictive gene signatures. The ability of these gene signatures to predict DFS was tested in an independent validation set comprising the remaining patients. Furthermore, 5 known clinical risk scores were tested in our complete patient cohort. RESULT: No gene signature was found that significantly predicted DFS in the validation set. In contrast, three out of five clinical risk scores were able to predict DFS in our patient cohort. CONCLUSIONS: No gene signature was found that could predict DFS in patients undergoing CRLM resection. Three out of five clinical risk scores were able to predict DFS in our patient cohort. These results emphasize the need for validating risk scores in independent patient groups and suggest improved designs for future studies.

  18. Effect of scatter correction on the compartmental measurement of striatal and extrastriatal dopamine D2 receptors using [123I]epidepride SPET

    International Nuclear Information System (INIS)

    Fujita, Masahiro; Seneca, Nicholas; Innis, Robert B.; Varrone, Andrea; Kim, Kyeong Min; Watabe, Hiroshi; Iida, Hidehiro; Zoghbi, Sami S.; Tipre, Dnyanesh; Seibyl, John P.

    2004-01-01

    Prior studies with anthropomorphic phantoms and single, static in vivo brain images have demonstrated that scatter correction significantly improves the accuracy of regional quantitation of single-photon emission tomography (SPET) brain images. Since the regional distribution of activity changes following a bolus injection of a typical neuroreceptor ligand, we examined the effect of scatter correction on the compartmental modeling of serial dynamic images of striatal and extrastriatal dopamine D 2 receptors using [ 123 I]epidepride. Eight healthy human subjects [age 30±8 (range 22-46) years] participated in a study with a bolus injection of 373±12 (354-389) MBq [ 123 I]epidepride and data acquisition over a period of 14 h. A transmission scan was obtained in each study for attenuation and scatter correction. Distribution volumes were calculated by means of compartmental nonlinear least-squares analysis using metabolite-corrected arterial input function and brain data processed with scatter correction using narrow-beam geometry μ (SC) and without scatter correction using broad-beam μ (NoSC). Effects of SC were markedly different among brain regions. SC increased activities in the putamen and thalamus after 1-1.5 h while it decreased activity during the entire experiment in the temporal cortex and cerebellum. Compared with NoSC, SC significantly increased specific distribution volume in the putamen (58%, P=0.0001) and thalamus (23%, P=0.0297). Compared with NoSC, SC made regional distribution of the specific distribution volume closer to that of [ 18 F]fallypride. It is concluded that SC is required for accurate quantification of distribution volumes of receptor ligands in SPET studies. (orig.)

  19. Effect of scatter correction on the compartmental measurement of striatal and extrastriatal dopamine D{sub 2} receptors using [{sup 123}I]epidepride SPET

    Energy Technology Data Exchange (ETDEWEB)

    Fujita, Masahiro; Seneca, Nicholas; Innis, Robert B. [Department of Psychiatry, Yale University School of Medicine and VA Connecticut Healthcare System, West Haven, CT (United States); Molecular Imaging Branch, National Institute of Mental Health, Bethesda, MD (United States); Varrone, Andrea [Department of Psychiatry, Yale University School of Medicine and VA Connecticut Healthcare System, West Haven, CT (United States); Biostructure and Bioimaging Institute, National Research Council, Napoli (Italy); Kim, Kyeong Min; Watabe, Hiroshi; Iida, Hidehiro [Department of Investigative Radiology, National Cardiovascular Center Research Institute, Osaka (Japan); Zoghbi, Sami S. [Department of Psychiatry, Yale University School of Medicine and VA Connecticut Healthcare System, West Haven, CT (United States); Molecular Imaging Branch, National Institute of Mental Health, Bethesda, MD (United States); Department of Radiology, Yale University School of Medicine and VA Connecticut Healthcare System, West Haven, CT (United States); Tipre, Dnyanesh [Molecular Imaging Branch, National Institute of Mental Health, Bethesda, MD (United States); Seibyl, John P. [Institute for Neurodegenerative Disorders, New Haven, CT (United States)

    2004-05-01

    Prior studies with anthropomorphic phantoms and single, static in vivo brain images have demonstrated that scatter correction significantly improves the accuracy of regional quantitation of single-photon emission tomography (SPET) brain images. Since the regional distribution of activity changes following a bolus injection of a typical neuroreceptor ligand, we examined the effect of scatter correction on the compartmental modeling of serial dynamic images of striatal and extrastriatal dopamine D{sub 2} receptors using [{sup 123}I]epidepride. Eight healthy human subjects [age 30{+-}8 (range 22-46) years] participated in a study with a bolus injection of 373{+-}12 (354-389) MBq [{sup 123}I]epidepride and data acquisition over a period of 14 h. A transmission scan was obtained in each study for attenuation and scatter correction. Distribution volumes were calculated by means of compartmental nonlinear least-squares analysis using metabolite-corrected arterial input function and brain data processed with scatter correction using narrow-beam geometry {mu} (SC) and without scatter correction using broad-beam {mu} (NoSC). Effects of SC were markedly different among brain regions. SC increased activities in the putamen and thalamus after 1-1.5 h while it decreased activity during the entire experiment in the temporal cortex and cerebellum. Compared with NoSC, SC significantly increased specific distribution volume in the putamen (58%, P=0.0001) and thalamus (23%, P=0.0297). Compared with NoSC, SC made regional distribution of the specific distribution volume closer to that of [{sup 18}F]fallypride. It is concluded that SC is required for accurate quantification of distribution volumes of receptor ligands in SPET studies. (orig.)

  20. Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrroly-sine containing genes

    DEFF Research Database (Denmark)

    Have, Christian Theil; Zambach, Sine; Christiansen, Henning

    2013-01-01

    for prediction of pyrrolysine incorporating genes in genomes of bacteria and archaea leading to insights about the factors driving pyrrolysine translation and identification of new gene candidates. The method predicts known conserved genes with high recall and predicts several other promising candidates...... for experimental verification. The method is implemented as a computational pipeline which is available on request....

  1. Effects of sample size on robustness and prediction accuracy of a prognostic gene signature

    Directory of Open Access Journals (Sweden)

    Kim Seon-Young

    2009-05-01

    Full Text Available Abstract Background Few overlap between independently developed gene signatures and poor inter-study applicability of gene signatures are two of major concerns raised in the development of microarray-based prognostic gene signatures. One recent study suggested that thousands of samples are needed to generate a robust prognostic gene signature. Results A data set of 1,372 samples was generated by combining eight breast cancer gene expression data sets produced using the same microarray platform and, using the data set, effects of varying samples sizes on a few performances of a prognostic gene signature were investigated. The overlap between independently developed gene signatures was increased linearly with more samples, attaining an average overlap of 16.56% with 600 samples. The concordance between predicted outcomes by different gene signatures also was increased with more samples up to 94.61% with 300 samples. The accuracy of outcome prediction also increased with more samples. Finally, analysis using only Estrogen Receptor-positive (ER+ patients attained higher prediction accuracy than using both patients, suggesting that sub-type specific analysis can lead to the development of better prognostic gene signatures Conclusion Increasing sample sizes generated a gene signature with better stability, better concordance in outcome prediction, and better prediction accuracy. However, the degree of performance improvement by the increased sample size was different between the degree of overlap and the degree of concordance in outcome prediction, suggesting that the sample size required for a study should be determined according to the specific aims of the study.

  2. Bioinformatics analysis of the predicted polyprenol reductase genes in higher plants

    Science.gov (United States)

    Basyuni, M.; Wati, R.

    2018-03-01

    The present study evaluates the bioinformatics methods to analyze twenty-four predicted polyprenol reductase genes from higher plants on GenBank as well as predicted the structure, composition, similarity, subcellular localization, and phylogenetic. The physicochemical properties of plant polyprenol showed diversity among the observed genes. The percentage of the secondary structure of plant polyprenol genes followed the ratio order of α helix > random coil > extended chain structure. The values of chloroplast but not signal peptide were too low, indicated that few chloroplast transit peptide in plant polyprenol reductase genes. The possibility of the potential transit peptide showed variation among the plant polyprenol reductase, suggested the importance of understanding the variety of peptide components of plant polyprenol genes. To clarify this finding, a phylogenetic tree was drawn. The phylogenetic tree shows several branches in the tree, suggested that plant polyprenol reductase genes grouped into divergent clusters in the tree.

  3. A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

    Science.gov (United States)

    Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A

    2006-01-01

    The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943

  4. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans.

    Science.gov (United States)

    Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne; Bourgeois, Stephane; Svensson, Peter J; Wadelius, Mia; Deloukas, Panos; Montgomery, Stephen B; Altman, Russ B

    2017-11-24

    Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects. Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.

  5. Prediction of highly expressed genes in microbes based on chromatin accessibility

    DEFF Research Database (Denmark)

    Willenbrock, Hanni; Ussery, David

    2007-01-01

    BACKGROUND: It is well known that gene expression is dependent on chromatin structure in eukaryotes and it is likely that chromatin can play a role in bacterial gene expression as well. Here, we use a nucleosomal position preference measure of anisotropic DNA flexibility to predict highly expressed...

  6. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans

    Directory of Open Access Journals (Sweden)

    Assaf Gottlieb

    2017-11-01

    Full Text Available Abstract Background Genome-wide association studies are useful for discovering genotype–phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into “gene level” effects. Methods Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression—on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. Results We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Conclusions Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort

  7. An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms.

    Science.gov (United States)

    Hua, Hong-Li; Zhang, Fa-Zhan; Labena, Abraham Alemayehu; Dong, Chuan; Jin, Yan-Ting; Guo, Feng-Biao

    Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus , which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.

  8. Genome-wide targeted prediction of ABA responsive genes in rice based on over-represented cis-motif in co-expressed genes.

    Science.gov (United States)

    Lenka, Sangram K; Lohia, Bikash; Kumar, Abhay; Chinnusamy, Viswanathan; Bansal, Kailash C

    2009-02-01

    Abscisic acid (ABA), the popular plant stress hormone, plays a key role in regulation of sub-set of stress responsive genes. These genes respond to ABA through specific transcription factors which bind to cis-regulatory elements present in their promoters. We discovered the ABA Responsive Element (ABRE) core (ACGT) containing CGMCACGTGB motif as over-represented motif among the promoters of ABA responsive co-expressed genes in rice. Targeted gene prediction strategy using this motif led to the identification of 402 protein coding genes potentially regulated by ABA-dependent molecular genetic network. RT-PCR analysis of arbitrarily chosen 45 genes from the predicted 402 genes confirmed 80% accuracy of our prediction. Plant Gene Ontology (GO) analysis of ABA responsive genes showed enrichment of signal transduction and stress related genes among diverse functional categories.

  9. A seven-gene CpG-island methylation panel predicts breast cancer progression

    International Nuclear Information System (INIS)

    Li, Yan; Melnikov, Anatoliy A.; Levenson, Victor; Guerra, Emanuela; Simeone, Pasquale; Alberti, Saverio; Deng, Youping

    2015-01-01

    DNA methylation regulates gene expression, through the inhibition/activation of gene transcription of methylated/unmethylated genes. Hence, DNA methylation profiling can capture pivotal features of gene expression in cancer tissues from patients at the time of diagnosis. In this work, we analyzed a breast cancer case series, to identify DNA methylation determinants of metastatic versus non-metastatic tumors. CpG-island methylation was evaluated on a 56-gene cancer-specific biomarker microarray in metastatic versus non-metastatic breast cancers in a multi-institutional case series of 123 breast cancer patients. Global statistical modeling and unsupervised hierarchical clustering were applied to identify a multi-gene binary classifier with high sensitivity and specificity. Network analysis was utilized to quantify the connectivity of the identified genes. Seven genes (BRCA1, DAPK1, MSH2, CDKN2A, PGR, PRKCDBP, RANKL) were found informative for prognosis of metastatic diffusion and were used to calculate classifier accuracy versus the entire data-set. Individual-gene performances showed sensitivities of 63–79 %, 53–84 % specificities, positive predictive values of 59–83 % and negative predictive values of 63–80 %. When modelled together, these seven genes reached a sensitivity of 93 %, 100 % specificity, a positive predictive value of 100 % and a negative predictive value of 93 %, with high statistical power. Unsupervised hierarchical clustering independently confirmed these findings, in close agreement with the accuracy measurements. Network analyses indicated tight interrelationship between the identified genes, suggesting this to be a functionally-coordinated module, linked to breast cancer progression. Our findings identify CpG-island methylation profiles with deep impact on clinical outcome, paving the way for use as novel prognostic assays in clinical settings. The online version of this article (doi:10.1186/s12885-015-1412-9) contains supplementary

  10. Concerted down-regulation of immune-system related genes predicts metastasis in colorectal carcinoma

    International Nuclear Information System (INIS)

    Fehlker, Marion; Huska, Matthew R; Jöns, Thomas; Andrade-Navarro, Miguel A; Kemmner, Wolfgang

    2014-01-01

    This study aimed at the identification of prognostic gene expression markers in early primary colorectal carcinomas without metastasis at the time point of surgery by analyzing genome-wide gene expression profiles using oligonucleotide microarrays. Cryo-conserved tumor specimens from 45 patients with early colorectal cancers were examined, with the majority of them being UICC stage II or earlier and with a follow-up time of 41–115 months. Gene expression profiling was performed using Whole Human Genome 4x44K Oligonucleotide Microarrays. Validation of microarray data was performed on five of the genes in a smaller cohort. Using a novel algorithm based on the recursive application of support vector machines (SVMs), we selected a signature of 44 probes that discriminated between patients developing later metastasis and patients with a good prognosis. Interestingly, almost half of the genes was related to the patients’ immune response and showed reduced expression in the metastatic cases. Whereas up to now gene signatures containing genes with various biological functions have been described for prediction of metastasis in CRC, in this study metastasis could be well predicted by a set of gene expression markers consisting exclusively of genes related to the MHC class II complex involved in immune response. Thus, our data emphasize that the proper function of a comprehensive network of immune response genes is of vital importance for the survival of colorectal cancer patients

  11. Genomic Features That Predict Allelic Imbalance in Humans Suggest Patterns of Constraint on Gene Expression Variation

    Science.gov (United States)

    Fédrigo, Olivier; Haygood, Ralph; Mukherjee, Sayan; Wray, Gregory A.

    2009-01-01

    Variation in gene expression is an important contributor to phenotypic diversity within and between species. Although this variation often has a genetic component, identification of the genetic variants driving this relationship remains challenging. In particular, measurements of gene expression usually do not reveal whether the genetic basis for any observed variation lies in cis or in trans to the gene, a distinction that has direct relevance to the physical location of the underlying genetic variant, and which may also impact its evolutionary trajectory. Allelic imbalance measurements identify cis-acting genetic effects by assaying the relative contribution of the two alleles of a cis-regulatory region to gene expression within individuals. Identification of patterns that predict commonly imbalanced genes could therefore serve as a useful tool and also shed light on the evolution of cis-regulatory variation itself. Here, we show that sequence motifs, polymorphism levels, and divergence levels around a gene can be used to predict commonly imbalanced genes in a human data set. Reduction of this feature set to four factors revealed that only one factor significantly differentiated between commonly imbalanced and nonimbalanced genes. We demonstrate that these results are consistent between the original data set and a second published data set in humans obtained using different technical and statistical methods. Finally, we show that variation in the single allelic imbalance-associated factor is partially explained by the density of genes in the region of a target gene (allelic imbalance is less probable for genes in gene-dense regions), and, to a lesser extent, the evenness of expression of the gene across tissues and the magnitude of negative selection on putative regulatory regions of the gene. These results suggest that the genomic distribution of functional cis-regulatory variants in the human genome is nonrandom, perhaps due to local differences in evolutionary

  12. Prediction of highly expressed genes in microbes based on chromatin accessibility

    Directory of Open Access Journals (Sweden)

    Ussery David W

    2007-02-01

    Full Text Available Abstract Background It is well known that gene expression is dependent on chromatin structure in eukaryotes and it is likely that chromatin can play a role in bacterial gene expression as well. Here, we use a nucleosomal position preference measure of anisotropic DNA flexibility to predict highly expressed genes in microbial genomes. We compare these predictions with those based on codon adaptation index (CAI values, and also with experimental data for 6 different microbial genomes, with a particular interest in experimental data from Escherichia coli. Moreover, position preference is examined further in 328 sequenced microbial genomes. Results We find that absolute gene expression levels are correlated with the position preference in many microbial genomes. It is postulated that in these regions, the DNA may be more accessible to the transcriptional machinery. Moreover, ribosomal proteins and ribosomal RNA are encoded by DNA having significantly lower position preference values than other genes in fast-replicating microbes. Conclusion This insight into DNA structure-dependent gene expression in microbes may be exploited for predicting the expression of non-translated genes such as non-coding RNAs that may not be predicted by any of the conventional codon usage bias approaches.

  13. Accurate prediction of secondary metabolite gene clusters in filamentous fungi

    DEFF Research Database (Denmark)

    Andersen, Mikael Rørdam; Nielsen, Jakob Blæsbjerg; Klitgaard, Andreas

    2013-01-01

    Biosynthetic pathways of secondary metabolites from fungi are currently subject to an intense effort to elucidate the genetic basis for these compounds due to their large potential within pharmaceutics and synthetic biochemistry. The preferred method is methodical gene deletions to identify...... used A. nidulans for our method development and validation due to the wealth of available biochemical data, but the method can be applied to any fungus with a sequenced and assembled genome, thus supporting further secondary metabolite pathway elucidation in the fungal kingdom....

  14. Gene prediction and RFX transcriptional regulation analysis using comparative genomics

    OpenAIRE

    Chu, Jeffrey Shih Chieh

    2011-01-01

    Regulatory Factor X (RFX) is a family of transcription factors (TF) that is conserved in all metazoans, in some fungi, and in only a few single-cellular organisms. Seven members are found in mammals, nine in fishes, three in fruit flies, and a single member in nematodes and fungi. RFX is involved in many different roles in humans, but a particular function that is conserved in many metazoans is its regulation of ciliogenesis. Probing over 150 genomes for the presence of RFX and ciliary genes ...

  15. Clustering gene expression data based on predicted differential effects of GV interaction.

    Science.gov (United States)

    Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu

    2005-02-01

    Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.

  16. Pathway analysis of gene signatures predicting metastasis of node-negative primary breast cancer

    International Nuclear Information System (INIS)

    Yu, Jack X; Sieuwerts, Anieta M; Zhang, Yi; Martens, John WM; Smid, Marcel; Klijn, Jan GM; Wang, Yixin; Foekens, John A

    2007-01-01

    Published prognostic gene signatures in breast cancer have few genes in common. Here we provide a rationale for this observation by studying the prognostic power and the underlying biological pathways of different gene signatures. Gene signatures to predict the development of metastases in estrogen receptor-positive and estrogen receptor-negative tumors were identified using 500 re-sampled training sets and mapping to Gene Ontology Biological Process to identify over-represented pathways. The Global Test program confirmed that gene expression profilings in the common pathways were associated with the metastasis of the patients. The apoptotic pathway and cell division, or cell growth regulation and G-protein coupled receptor signal transduction, were most significantly associated with the metastatic capability of estrogen receptor-positive or estrogen-negative tumors, respectively. A gene signature derived of the common pathways predicted metastasis in an independent cohort. Mapping of the pathways represented by different published prognostic signatures showed that they share 53% of the identified pathways. We show that divergent gene sets classifying patients for the same clinical endpoint represent similar biological processes and that pathway-derived signatures can be used to predict prognosis. Furthermore, our study reveals that the underlying biology related to aggressiveness of estrogen receptor subgroups of breast cancer is quite different

  17. Analysis and prediction of gene splice sites in four Aspergillus genomes

    DEFF Research Database (Denmark)

    Wang, Kai; Ussery, David; Brunak, Søren

    2009-01-01

    Several Aspergillus fungal genomic sequences have been published, with many more in progress. Obviously, it is essential to have high-quality, consistently annotated sets of proteins from each of the genomes, in order to make meaningful comparisons. We have developed a dedicated, publicly available......, splice site prediction program called NetAspGene, for the genus Aspergillus. Gene sequences from Aspergillus fumigatus, the most common mould pathogen, were used to build and test our model. Compared to many animals and plants, Aspergillus contains smaller introns; thus we have applied a larger window...... better splice site prediction than other available tools. NetAspGene will be very helpful for the study in Aspergillus splice sites and especially in alternative splicing. A webpage for NetAspGene is publicly available at http://www.cbs.dtu.dk/services/NetAspGene....

  18. Oxytocin receptor gene variation predicts subjective responses to MDMA.

    Science.gov (United States)

    Bershad, Anya K; Weafer, Jessica J; Kirkpatrick, Matthew G; Wardle, Margaret C; Miller, Melissa A; de Wit, Harriet

    2016-12-01

    3,4-Methylenedioxymethamphetamine (MDMA, "ecstasy") enhances desire to socialize and feelings of empathy, which are thought to be related to increased oxytocin levels. Thus, variation in the oxytocin receptor gene (OXTR) may influence responses to the drug. Here, we examined the influence of a single OXTR nucleotide polymorphism (SNP) on responses to MDMA in humans. Based on findings that carriers of the A allele at rs53576 exhibit reduced sensitivity to oxytocin-induced social behavior, we hypothesized that these individuals would show reduced subjective responses to MDMA, including sociability. In this three-session, double blind, within-subjects study, healthy volunteers with past MDMA experience (N = 68) received a MDMA (0, 0.75 mg/kg, and 1.5 mg/kg) and provided self-report ratings of sociability, anxiety, and drug effects. These responses were examined in relation to rs53576. MDMA (1.5 mg/kg) did not increase sociability in individuals with the A/A genotype as it did in G allele carriers. The genotypic groups did not differ in responses at the lower MDMA dose, or in cardiovascular or other subjective responses. These findings are consistent with the idea that MDMA-induced sociability is mediated by oxytocin, and that variation in the oxytocin receptor gene may influence responses to the drug.

  19. Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.

    Directory of Open Access Journals (Sweden)

    Thomas H A Ederveen

    Full Text Available Nowadays, prokaryotic genomes are sequenced faster than the capacity to manually curate gene annotations. Automated genome annotation engines provide users a straight-forward and complete solution for predicting ORF coordinates and function. For many labs, the use of AGEs is therefore essential to decrease the time necessary for annotating a given prokaryotic genome. However, it is not uncommon for AGEs to provide different and sometimes conflicting predictions. Combining multiple AGEs might allow for more accurate predictions. Here we analyzed the ab initio open reading frame (ORF calling performance of different AGEs based on curated genome annotations of eight strains from different bacterial species with GC% ranging from 35-52%. We present a case study which demonstrates a novel way of comparative genome annotation, using combinations of AGEs in a pre-defined order (or path to predict ORF start codons. The order of AGE combinations is from high to low specificity, where the specificity is based on the eight genome annotations. For each AGE combination we are able to derive a so-called projected confidence value, which is the average specificity of ORF start codon prediction based on the eight genomes. The projected confidence enables estimating likeliness of a correct prediction for a particular ORF start codon by a particular AGE combination, pinpointing ORFs notoriously difficult to predict start codons. We correctly predict start codons for 90.5±4.8% of the genes in a genome (based on the eight genomes with an accuracy of 81.1±7.6%. Our consensus-path methodology allows a marked improvement over majority voting (9.7±4.4% and with an optimal path ORF start prediction sensitivity is gained while maintaining a high specificity.

  20. Predicting Genes Involved in Human Cancer Using Network Contextual Information

    Directory of Open Access Journals (Sweden)

    Rahmani Hossein

    2012-03-01

    Full Text Available Protein-Protein Interaction (PPI networks have been widely used for the task of predicting proteins involved in cancer. Previous research has shown that functional information about the protein for which a prediction is made, proximity to specific other proteins in the PPI network, as well as local network structure are informative features in this respect. In this work, we introduce two new types of input features, reflecting additional information: (1 Functional Context: the functions of proteins interacting with the target protein (rather than the protein itself; and (2 Structural Context: the relative position of the target protein with respect to specific other proteins selected according to a novel ANOVA (analysis of variance based measure. We also introduce a selection strategy to pinpoint the most informative features. Results show that the proposed feature types and feature selection strategy yield informative features. A standard machine learning method (Naive Bayes that uses the features proposed here outperforms the current state-of-the-art methods by more than 5% with respect to F-measure. In addition, manual inspection confirms the biological relevance of the top-ranked features.

  1. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes

    Directory of Open Access Journals (Sweden)

    Yang Yi-Fan

    2007-03-01

    Full Text Available Abstract Background Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. Results This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs and Translation Initiation Sites (TISs. The former is based on a linguistic "Entropy Density Profile" (EDP model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Conclusion Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  2. An approach for reduction of false predictions in reverse engineering of gene regulatory networks.

    Science.gov (United States)

    Khan, Abhinandan; Saha, Goutam; Pal, Rajat Kumar

    2018-05-14

    A gene regulatory network discloses the regulatory interactions amongst genes, at a particular condition of the human body. The accurate reconstruction of such networks from time-series genetic expression data using computational tools offers a stiff challenge for contemporary computer scientists. This is crucial to facilitate the understanding of the proper functioning of a living organism. Unfortunately, the computational methods produce many false predictions along with the correct predictions, which is unwanted. Investigations in the domain focus on the identification of as many correct regulations as possible in the reverse engineering of gene regulatory networks to make it more reliable and biologically relevant. One way to achieve this is to reduce the number of incorrect predictions in the reconstructed networks. In the present investigation, we have proposed a novel scheme to decrease the number of false predictions by suitably combining several metaheuristic techniques. We have implemented the same using a dataset ensemble approach (i.e. combining multiple datasets) also. We have employed the proposed methodology on real-world experimental datasets of the SOS DNA Repair network of Escherichia coli and the IMRA network of Saccharomyces cerevisiae. Subsequently, we have experimented upon somewhat larger, in silico networks, namely, DREAM3 and DREAM4 Challenge networks, and 15-gene and 20-gene networks extracted from the GeneNetWeaver database. To study the effect of multiple datasets on the quality of the inferred networks, we have used four datasets in each experiment. The obtained results are encouraging enough as the proposed methodology can reduce the number of false predictions significantly, without using any supplementary prior biological information for larger gene regulatory networks. It is also observed that if a small amount of prior biological information is incorporated here, the results improve further w.r.t. the prediction of true positives

  3. Adipose gene expression prior to weight loss can differentiate and weakly predict dietary responders.

    Directory of Open Access Journals (Sweden)

    David M Mutch

    Full Text Available BACKGROUND: The ability to identify obese individuals who will successfully lose weight in response to dietary intervention will revolutionize disease management. Therefore, we asked whether it is possible to identify subjects who will lose weight during dietary intervention using only a single gene expression snapshot. METHODOLOGY/PRINCIPAL FINDINGS: The present study involved 54 female subjects from the Nutrient-Gene Interactions in Human Obesity-Implications for Dietary Guidelines (NUGENOB trial to determine whether subcutaneous adipose tissue gene expression could be used to predict weight loss prior to the 10-week consumption of a low-fat hypocaloric diet. Using several statistical tests revealed that the gene expression profiles of responders (8-12 kgs weight loss could always be differentiated from non-responders (<4 kgs weight loss. We also assessed whether this differentiation was sufficient for prediction. Using a bottom-up (i.e. black-box approach, standard class prediction algorithms were able to predict dietary responders with up to 61.1%+/-8.1% accuracy. Using a top-down approach (i.e. using differentially expressed genes to build a classifier improved prediction accuracy to 80.9%+/-2.2%. CONCLUSION: Adipose gene expression profiling prior to the consumption of a low-fat diet is able to differentiate responders from non-responders as well as serve as a weak predictor of subjects destined to lose weight. While the degree of prediction accuracy currently achieved with a gene expression snapshot is perhaps insufficient for clinical use, this work reveals that the comprehensive molecular signature of adipose tissue paves the way for the future of personalized nutrition.

  4. Phylogenomic detection and functional prediction of genes potentially important for plant meiosis.

    Science.gov (United States)

    Zhang, Luoyan; Kong, Hongzhi; Ma, Hong; Yang, Ji

    2018-02-15

    Meiosis is a specialized type of cell division necessary for sexual reproduction in eukaryotes. A better understanding of the cytological procedures of meiosis has been achieved by comprehensive cytogenetic studies in plants, while the genetic mechanisms regulating meiotic progression remain incompletely understood. The increasing accumulation of complete genome sequences and large-scale gene expression datasets has provided a powerful resource for phylogenomic inference and unsupervised identification of genes involved in plant meiosis. By integrating sequence homology and expression data, 164, 131, 124 and 162 genes potentially important for meiosis were identified in the genomes of Arabidopsis thaliana, Oryza sativa, Selaginella moellendorffii and Pogonatum aloides, respectively. The predicted genes were assigned to 45 meiotic GO terms, and their functions were related to different processes occurring during meiosis in various organisms. Most of the predicted meiotic genes underwent lineage-specific duplication events during plant evolution, with about 30% of the predicted genes retaining only a single copy in higher plant genomes. The results of this study provided clues to design experiments for better functional characterization of meiotic genes in plants, promoting the phylogenomic approach to the evolutionary dynamics of the plant meiotic machineries. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. The integration of weighted human gene association networks based on link prediction.

    Science.gov (United States)

    Yang, Jian; Yang, Tinghong; Wu, Duzhi; Lin, Limei; Yang, Fan; Zhao, Jing

    2017-01-31

    Physical and functional interplays between genes or proteins have important biological meaning for cellular functions. Some efforts have been made to construct weighted gene association meta-networks by integrating multiple biological resources, where the weight indicates the confidence of the interaction. However, it is found that these existing human gene association networks share only quite limited overlapped interactions, suggesting their incompleteness and noise. Here we proposed a workflow to construct a weighted human gene association network using information of six existing networks, including two weighted specific PPI networks and four gene association meta-networks. We applied link prediction algorithm to predict possible missing links of the networks, cross-validation approach to refine each network and finally integrated the refined networks to get the final integrated network. The common information among the refined networks increases notably, suggesting their higher reliability. Our final integrated network owns much more links than most of the original networks, meanwhile its links still keep high functional relevance. Being used as background network in a case study of disease gene prediction, the final integrated network presents good performance, implying its reliability and application significance. Our workflow could be insightful for integrating and refining existing gene association data.

  6. Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower.

    Science.gov (United States)

    Thorwarth, Patrick; Yousef, Eltohamy A A; Schmid, Karl J

    2018-02-02

    Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS) and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower ( Brassica oleracea var. botrytis ) by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS) and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding. Copyright © 2018 Thorwarth et al.

  7. Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower

    Directory of Open Access Journals (Sweden)

    Patrick Thorwarth

    2018-02-01

    Full Text Available Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower (Brassica oleracea var. botrytis by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding.

  8. Proteome Profiling Outperforms Transcriptome Profiling for Coexpression Based Gene Function Prediction

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Jing; Ma, Zihao; Carr, Steven A.; Mertins, Philipp; Zhang, Hui; Zhang, Zhen; Chan, Daniel W.; Ellis, Matthew J. C.; Townsend, R. Reid; Smith, Richard D.; McDermott, Jason E.; Chen, Xian; Paulovich, Amanda G.; Boja, Emily S.; Mesri, Mehdi; Kinsinger, Christopher R.; Rodriguez, Henry; Rodland, Karin D.; Liebler, Daniel C.; Zhang, Bing

    2016-11-11

    Coexpression of mRNAs under multiple conditions is commonly used to infer cofunctionality of their gene products despite well-known limitations of this “guilt-by-association” (GBA) approach. Recent advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for coexpression based gene function prediction has not been systematically investigated. Here, we address this question by constructing and analyzing mRNA and protein coexpression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Our analyses revealed a marked difference in wiring between the mRNA and protein coexpression networks. Whereas protein coexpression was driven primarily by functional similarity between coexpressed genes, mRNA coexpression was driven by both cofunction and chromosomal colocalization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein coexpression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. Our results demonstrate that proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies

  9. Gene expression variation to predict 10-year survival in lymph-node-negative breast cancer

    International Nuclear Information System (INIS)

    Karlsson, Elin; Delle, Ulla; Danielsson, Anna; Olsson, Björn; Abel, Frida; Karlsson, Per; Helou, Khalil

    2008-01-01

    It is of great significance to find better markers to correctly distinguish between high-risk and low-risk breast cancer patients since the majority of breast cancer cases are at present being overtreated. 46 tumours from node-negative breast cancer patients were studied with gene expression microarrays. A t-test was carried out in order to find a set of genes where the expression might predict clinical outcome. Two classifiers were used for evaluation of the gene lists, a correlation-based classifier and a Voting Features Interval (VFI) classifier. We then evaluated the predictive accuracy of this expression signature on tumour sets from two similar studies on lymph-node negative patients. They had both developed gene expression signatures superior to current methods in classifying node-negative breast tumours. These two signatures were also tested on our material. A list of 51 genes whose expression profiles could predict clinical outcome with high accuracy in our material (96% or 89% accuracy in cross-validation, depending on type of classifier) was developed. When tested on two independent data sets, the expression signature based on the 51 identified genes had good predictive qualities in one of the data sets (74% accuracy), whereas their predictive value on the other data set were poor, presumably due to the fact that only 23 of the 51 genes were found in that material. We also found that previously developed expression signatures could predict clinical outcome well to moderately well in our material (72% and 61%, respectively). The list of 51 genes derived in this study might have potential for clinical utility as a prognostic gene set, and may include candidate genes of potential relevance for clinical outcome in breast cancer. According to the predictions by this expression signature, 30 of the 46 patients may have benefited from different adjuvant treatment than they recieved. The research on these tumours was approved by the Medical Faculty Research

  10. lncRNA Gene Signatures for Prediction of Breast Cancer Intrinsic Subtypes and Prognosis

    Directory of Open Access Journals (Sweden)

    Silu Zhang

    2018-01-01

    Full Text Available Background: Breast cancer is intrinsically heterogeneous and is commonly classified into four main subtypes associated with distinct biological features and clinical outcomes. However, currently available data resources and methods are limited in identifying molecular subtyping on protein-coding genes, and little is known about the roles of long non-coding RNAs (lncRNAs, which occupies 98% of the whole genome. lncRNAs may also play important roles in subgrouping cancer patients and are associated with clinical phenotypes. Methods: The purpose of this project was to identify lncRNA gene signatures that are associated with breast cancer subtypes and clinical outcomes. We identified lncRNA gene signatures from The Cancer Genome Atlas (TCGA RNAseq data that are associated with breast cancer subtypes by an optimized 1-Norm SVM feature selection algorithm. We evaluated the prognostic performance of these gene signatures with a semi-supervised principal component (superPC method. Results: Although lncRNAs can independently predict breast cancer subtypes with satisfactory accuracy, a combined gene signature including both coding and non-coding genes will give the best clinically relevant prediction performance. We highlighted eight potential biomarkers (three from coding genes and five from non-coding genes that are significantly associated with survival outcomes. Conclusion: Our proposed methods are a novel means of identifying subtype-specific coding and non-coding potential biomarkers that are both clinically relevant and biologically significant.

  11. Predictive networks: a flexible, open source, web application for integration and analysis of human gene networks.

    Science.gov (United States)

    Haibe-Kains, Benjamin; Olsen, Catharina; Djebbari, Amira; Bontempi, Gianluca; Correll, Mick; Bouton, Christopher; Quackenbush, John

    2012-01-01

    Genomics provided us with an unprecedented quantity of data on the genes that are activated or repressed in a wide range of phenotypes. We have increasingly come to recognize that defining the networks and pathways underlying these phenotypes requires both the integration of multiple data types and the development of advanced computational methods to infer relationships between the genes and to estimate the predictive power of the networks through which they interact. To address these issues we have developed Predictive Networks (PN), a flexible, open-source, web-based application and data services framework that enables the integration, navigation, visualization and analysis of gene interaction networks. The primary goal of PN is to allow biomedical researchers to evaluate experimentally derived gene lists in the context of large-scale gene interaction networks. The PN analytical pipeline involves two key steps. The first is the collection of a comprehensive set of known gene interactions derived from a variety of publicly available sources. The second is to use these 'known' interactions together with gene expression data to infer robust gene networks. The PN web application is accessible from http://predictivenetworks.org. The PN code base is freely available at https://sourceforge.net/projects/predictivenets/.

  12. Can Thrifty Gene(s or Predictive Fetal Programming for Thriftiness Lead to Obesity?

    Directory of Open Access Journals (Sweden)

    Ulfat Baig

    2011-01-01

    Full Text Available Obesity and related disorders are thought to have their roots in metabolic “thriftiness” that evolved to combat periodic starvation. The association of low birth weight with obesity in later life caused a shift in the concept from thrifty gene to thrifty phenotype or anticipatory fetal programming. The assumption of thriftiness is implicit in obesity research. We examine here, with the help of a mathematical model, the conditions for evolution of thrifty genes or fetal programming for thriftiness. The model suggests that a thrifty gene cannot exist in a stable polymorphic state in a population. The conditions for evolution of thrifty fetal programming are restricted if the correlation between intrauterine and lifetime conditions is poor. Such a correlation is not observed in natural courses of famine. If there is fetal programming for thriftiness, it could have evolved in anticipation of social factors affecting nutrition that can result in a positive correlation.

  13. EvoCor: a platform for predicting functionally related genes using phylogenetic and expression profiles.

    Science.gov (United States)

    Dittmar, W James; McIver, Lauren; Michalak, Pawel; Garner, Harold R; Valdez, Gregorio

    2014-07-01

    The wealth of publicly available gene expression and genomic data provides unique opportunities for computational inference to discover groups of genes that function to control specific cellular processes. Such genes are likely to have co-evolved and be expressed in the same tissues and cells. Unfortunately, the expertise and computational resources required to compare tens of genomes and gene expression data sets make this type of analysis difficult for the average end-user. Here, we describe the implementation of a web server that predicts genes involved in affecting specific cellular processes together with a gene of interest. We termed the server 'EvoCor', to denote that it detects functional relationships among genes through evolutionary analysis and gene expression correlation. This web server integrates profiles of sequence divergence derived by a Hidden Markov Model (HMM) and tissue-wide gene expression patterns to determine putative functional linkages between pairs of genes. This server is easy to use and freely available at http://pilot-hmm.vbi.vt.edu/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. A gene signature in histologically normal surgical margins is predictive of oral carcinoma recurrence

    International Nuclear Information System (INIS)

    Reis, Patricia P; Simpson, Colleen; Goldstein, David; Brown, Dale; Gilbert, Ralph; Gullane, Patrick; Irish, Jonathan; Jurisica, Igor; Kamel-Reid, Suzanne; Waldron, Levi; Perez-Ordonez, Bayardo; Pintilie, Melania; Galloni, Natalie Naranjo; Xuan, Yali; Cervigne, Nilva K; Warner, Giles C; Makitie, Antti A

    2011-01-01

    Oral Squamous Cell Carcinoma (OSCC) is a major cause of cancer death worldwide, which is mainly due to recurrence leading to treatment failure and patient death. Histological status of surgical margins is a currently available assessment for recurrence risk in OSCC; however histological status does not predict recurrence, even in patients with histologically negative margins. Therefore, molecular analysis of histologically normal resection margins and the corresponding OSCC may aid in identifying a gene signature predictive of recurrence. We used a meta-analysis of 199 samples (OSCCs and normal oral tissues) from five public microarray datasets, in addition to our microarray analysis of 96 OSCCs and histologically normal margins from 24 patients, to train a gene signature for recurrence. Validation was performed by quantitative real-time PCR using 136 samples from an independent cohort of 30 patients. We identified 138 significantly over-expressed genes (> 2-fold, false discovery rate of 0.01) in OSCC. By penalized likelihood Cox regression, we identified a 4-gene signature with prognostic value for recurrence in our training set. This signature comprised the invasion-related genes MMP1, COL4A1, P4HA2, and THBS2. Over-expression of this 4-gene signature in histologically normal margins was associated with recurrence in our training cohort (p = 0.0003, logrank test) and in our independent validation cohort (p = 0.04, HR = 6.8, logrank test). Gene expression alterations occur in histologically normal margins in OSCC. Over-expression of the 4-gene signature in histologically normal surgical margins was validated and highly predictive of recurrence in an independent patient cohort. Our findings may be applied to develop a molecular test, which would be clinically useful to help predict which patients are at a higher risk of local recurrence

  15. MITEs in the promoters of effector genes allow prediction of novel virulence genes in Fusarium oxysporum

    NARCIS (Netherlands)

    Schmidt, S.M.; Houterman, P.M.; Schreiver, I.; Ma, L.; Amyotte, S.; Chellappan, B.; Boeren, S.; Takken, F.L.W.; Rep, M.

    2013-01-01

    Background The plant-pathogenic fungus Fusarium oxysporum f.sp.lycopersici (Fol) has accessory, lineage-specific (LS) chromosomes that can be transferred horizontally between strains. A single LS chromosome in the Fol4287 reference strain harbors all known Fol effector genes. Transfer of this

  16. Gene prediction in metagenomic fragments: A large scale machine learning approach

    Directory of Open Access Journals (Sweden)

    Morgenstern Burkhard

    2008-04-01

    Full Text Available Abstract Background Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. Results We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Conclusion Large scale machine learning methods are well-suited for gene

  17. Probability-based collaborative filtering model for predicting gene-disease associations.

    Science.gov (United States)

    Zeng, Xiangxiang; Ding, Ningxiang; Rodríguez-Patón, Alfonso; Zou, Quan

    2017-12-28

    Accurately predicting pathogenic human genes has been challenging in recent research. Considering extensive gene-disease data verified by biological experiments, we can apply computational methods to perform accurate predictions with reduced time and expenses. We propose a probability-based collaborative filtering model (PCFM) to predict pathogenic human genes. Several kinds of data sets, containing data of humans and data of other nonhuman species, are integrated in our model. Firstly, on the basis of a typical latent factorization model, we propose model I with an average heterogeneous regularization. Secondly, we develop modified model II with personal heterogeneous regularization to enhance the accuracy of aforementioned models. In this model, vector space similarity or Pearson correlation coefficient metrics and data on related species are also used. We compared the results of PCFM with the results of four state-of-arts approaches. The results show that PCFM performs better than other advanced approaches. PCFM model can be leveraged for predictions of disease genes, especially for new human genes or diseases with no known relationships.

  18. Computational prediction and experimental validation of Ciona intestinalis microRNA genes

    Directory of Open Access Journals (Sweden)

    Pasquinelli Amy E

    2007-11-01

    Full Text Available Abstract Background This study reports the first collection of validated microRNA genes in the sea squirt, Ciona intestinalis. MicroRNAs are processed from hairpin precursors to ~22 nucleotide RNAs that base pair to target mRNAs and inhibit expression. As a member of the subphylum Urochordata (Tunicata whose larval form has a notochord, the sea squirt is situated at the emergence of vertebrates, and therefore may provide information about the evolution of molecular regulators of early development. Results In this study, computational methods were used to predict 14 microRNA gene families in Ciona intestinalis. The microRNA prediction algorithm utilizes configurable microRNA sequence conservation and stem-loop specificity parameters, grouping by miRNA family, and phylogenetic conservation to the related species, Ciona savignyi. The expression for 8, out of 9 attempted, of the putative microRNAs in the adult tissue of Ciona intestinalis was validated by Northern blot analyses. Additionally, a target prediction algorithm was implemented, which identified a high confidence list of 240 potential target genes. Over half of the predicted targets can be grouped into the gene ontology categories of metabolism, transport, regulation of transcription, and cell signaling. Conclusion The computational techniques implemented in this study can be applied to other organisms and serve to increase the understanding of the origins of non-coding RNAs, embryological and cellular developmental pathways, and the mechanisms for microRNA-controlled gene regulatory networks.

  19. An ensemble method to predict target genes and pathways in uveal melanoma

    Directory of Open Access Journals (Sweden)

    Wei Chao

    2018-04-01

    Full Text Available This work proposes to predict target genes and pathways for uveal melanoma (UM based on an ensemble method and pathway analyses. Methods: The ensemble method integrated a correlation method (Pearson correlation coefficient, PCC, a causal inference method (IDA and a regression method (Lasso utilizing the Borda count election method. Subsequently, to validate the performance of PIL method, comparisons between confirmed database and predicted miRNA targets were performed. Ultimately, pathway enrichment analysis was conducted on target genes in top 1000 miRNA-mRNA interactions to identify target pathways for UM patients. Results: Thirty eight of the predicted interactions were matched with the confirmed interactions, indicating that the ensemble method was a suitable and feasible approach to predict miRNA targets. We obtained 50 seed miRNA-mRNA interactions of UM patients and extracted target genes from these interactions, such as ASPG, BSDC1 and C4BP. The 601 target genes in top 1,000 miRNA-mRNA interactions were enriched in 12 target pathways, of which Phototransduction was the most significant one. Conclusion: The target genes and pathways might provide a new way to reveal the molecular mechanism of UM and give hand for target treatments and preventions of this malignant tumor.

  20. CAsubtype: An R Package to Identify Gene Sets Predictive of Cancer Subtypes and Clinical Outcomes.

    Science.gov (United States)

    Kong, Hualei; Tong, Pan; Zhao, Xiaodong; Sun, Jielin; Li, Hua

    2018-03-01

    In the past decade, molecular classification of cancer has gained high popularity owing to its high predictive power on clinical outcomes as compared with traditional methods commonly used in clinical practice. In particular, using gene expression profiles, recent studies have successfully identified a number of gene sets for the delineation of cancer subtypes that are associated with distinct prognosis. However, identification of such gene sets remains a laborious task due to the lack of tools with flexibility, integration and ease of use. To reduce the burden, we have developed an R package, CAsubtype, to efficiently identify gene sets predictive of cancer subtypes and clinical outcomes. By integrating more than 13,000 annotated gene sets, CAsubtype provides a comprehensive repertoire of candidates for new cancer subtype identification. For easy data access, CAsubtype further includes the gene expression and clinical data of more than 2000 cancer patients from TCGA. CAsubtype first employs principal component analysis to identify gene sets (from user-provided or package-integrated ones) with robust principal components representing significantly large variation between cancer samples. Based on these principal components, CAsubtype visualizes the sample distribution in low-dimensional space for better understanding of the distinction between samples and classifies samples into subgroups with prevalent clustering algorithms. Finally, CAsubtype performs survival analysis to compare the clinical outcomes between the identified subgroups, assessing their clinical value as potentially novel cancer subtypes. In conclusion, CAsubtype is a flexible and well-integrated tool in the R environment to identify gene sets for cancer subtype identification and clinical outcome prediction. Its simple R commands and comprehensive data sets enable efficient examination of the clinical value of any given gene set, thus facilitating hypothesis generating and testing in biological and

  1. Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets

    Directory of Open Access Journals (Sweden)

    Karacali Bilge

    2007-10-01

    Full Text Available Abstract Background Independently derived expression profiles of the same biological condition often have few genes in common. In this study, we created populations of expression profiles from publicly available microarray datasets of cancer (breast, lymphoma and renal samples linked to clinical information with an iterative machine learning algorithm. ROC curves were used to assess the prediction error of each profile for classification. We compared the prediction error of profiles correlated with molecular phenotype against profiles correlated with relapse-free status. Prediction error of profiles identified with supervised univariate feature selection algorithms were compared to profiles selected randomly from a all genes on the microarray platform and b a list of known disease-related genes (a priori selection. We also determined the relevance of expression profiles on test arrays from independent datasets, measured on either the same or different microarray platforms. Results Highly discriminative expression profiles were produced on both simulated gene expression data and expression data from breast cancer and lymphoma datasets on the basis of ER and BCL-6 expression, respectively. Use of relapse-free status to identify profiles for prognosis prediction resulted in poorly discriminative decision rules. Supervised feature selection resulted in more accurate classifications than random or a priori selection, however, the difference in prediction error decreased as the number of features increased. These results held when decision rules were applied across-datasets to samples profiled on the same microarray platform. Conclusion Our results show that many gene sets predict molecular phenotypes accurately. Given this, expression profiles identified using different training datasets should be expected to show little agreement. In addition, we demonstrate the difficulty in predicting relapse directly from microarray data using supervised machine

  2. Entropy-based gene ranking without selection bias for the predictive classification of microarray data

    Directory of Open Access Journals (Sweden)

    Serafini Maria

    2003-11-01

    Full Text Available Abstract Background We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process. Results With E-RFE, we speed up the recursive feature elimination (RFE with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Conclusions Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.

  3. Minimal gene selection for classification and diagnosis prediction based on gene expression profile

    Directory of Open Access Journals (Sweden)

    Alireza Mehridehnavi

    2013-01-01

    Conclusion: We have shown that the use of two most significant genes based on their S/N ratios and selection of suitable training samples can lead to classify DLBCL patients with a rather good result. Actually with the aid of mentioned methods we could compensate lack of enough number of patients, improve accuracy of classifying and reduce complication of computations and so running time.

  4. Gene expression prediction by soft integration and the elastic net-best performance of the DREAM3 gene expression challenge.

    Directory of Open Access Journals (Sweden)

    Mika Gustafsson

    Full Text Available BACKGROUND: To predict gene expressions is an important endeavour within computational systems biology. It can both be a way to explore how drugs affect the system, as well as providing a framework for finding which genes are interrelated in a certain process. A practical problem, however, is how to assess and discriminate among the various algorithms which have been developed for this purpose. Therefore, the DREAM project invited the year 2008 to a challenge for predicting gene expression values, and here we present the algorithm with best performance. METHODOLOGY/PRINCIPAL FINDINGS: We develop an algorithm by exploring various regression schemes with different model selection procedures. It turns out that the most effective scheme is based on least squares, with a penalty term of a recently developed form called the "elastic net". Key components in the algorithm are the integration of expression data from other experimental conditions than those presented for the challenge and the utilization of transcription factor binding data for guiding the inference process towards known interactions. Of importance is also a cross-validation procedure where each form of external data is used only to the extent it increases the expected performance. CONCLUSIONS/SIGNIFICANCE: Our algorithm proves both the possibility to extract information from large-scale expression data concerning prediction of gene levels, as well as the benefits of integrating different data sources for improving the inference. We believe the former is an important message to those still hesitating on the possibilities for computational approaches, while the latter is part of an important way forward for the future development of the field of computational systems biology.

  5. A 65‑gene signature for prognostic prediction in colon adenocarcinoma.

    Science.gov (United States)

    Jiang, Hui; Du, Jun; Gu, Jiming; Jin, Liugen; Pu, Yong; Fei, Bojian

    2018-04-01

    The aim of the present study was to examine the molecular factors associated with the prognosis of colon cancer. Gene expression datasets were downloaded from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus databases to screen differentially expressed genes (DEGs) between colon cancer samples and normal samples. Survival‑related genes were selected from the DEGs using the Cox regression method. A co‑expression network of survival‑related genes was then constructed, and functional clusters were extracted from this network. The significantly enriched functions and pathways of the genes in the network were identified. Using Bayesian discriminant analysis, a prognostic prediction system was established to distinguish the positive from negative prognostic samples. The discrimination efficacy of the system was validated in the GSE17538 dataset using Kaplan‑Meier survival analysis. A total of 636 and 1,892 DEGs between the colon cancer samples and normal samples were screened from the TCGA and GSE44861 dataset, respectively. There were 155 survival‑related genes selected. The co‑expression network of survival‑related genes included 138 genes, 534 lines (connections) and five functional clusters, including the signaling pathway, cellular response to cAMP, and immune system process functional clusters. The molecular function, cellular components and biological processes were the significantly enriched functions. The peroxisome proliferator‑activated receptor signaling pathway, Wnt signaling pathway, B cell receptor signaling pathway, and cytokine‑cytokine receptor interactions were the significant pathways. A prognostic prediction system based on a 65‑gene signature was established using this co‑expression network. Its discriminatory effect was validated in the TCGA dataset (P=3.56e‑12) and the GSE17538 dataset (P=1.67e‑6). The 65‑gene signature included kallikrein‑related peptidase 6 (KLK6), collagen type XI α1 (COL11A1), cartilage

  6. The predictive value of the 70-gene signature for adjuvant chemotherapy in early breast cancer

    NARCIS (Netherlands)

    Knauer, Michael; Mook, Stella; Rutgers, Emiel J. T.; Bender, Richard A.; Hauptmann, Michael; van de Vijver, Marc J.; Koornstra, Rutger H. T.; Bueno-de-Mesquita, Jolien M.; Linn, Sabine C.; van 't Veer, Laura J.

    2010-01-01

    Multigene assays have been developed and validated to determine the prognosis of breast cancer. In this study, we assessed the additional predictive value of the 70-gene MammaPrint signature for chemotherapy (CT) benefit in addition to endocrine therapy (ET) from pooled study series. For 541

  7. [The value of 5-HTT gene polymorphism for the assessment and prediction of male adolescence violence].

    Science.gov (United States)

    Yu, Yue; Liu, Xiang; Yang, Zhen-xing; Qiu, Chang-jian; Ma, Xiao-hong

    2012-08-01

    To establish an adolescent violence crime prediction model, and to assess the value of serotonin transporter (5-HTT) gene polymorphism for the assessment and prediction of violent crime. Investigative tools were used to analyze the difference in personality dimensions, social support, coping styles, aggressiveness, impulsivity, and family condition scale between 223 adolescents with violence behavior and 148 adolescents without violence behavior. The distribution of 5-HTT gene polymorphisms (5-HTTLPR and 5-HTTVNTR) was compared between the two groups. The role of 5-HTT gene polymorphism on adolescent personality, impulsion and aggression scale also was also analyzed. Stepwise logistic regression was used to establish a predictive model for adolescent violent crime. Significant difference was found between the violence group and the control group on multiple dimensions of psychology and environment scales. However, no statistical difference was found with regard to the 5-HTT genotypes and alleles between adolescents with violent behaviors and normal controls. The rate of prediction accuracy was not significantly improved when 5-HTT gene polymorphism was taken into the model. The violent crime of adolescents was closely related with social and environmental factors. No association was found between 5-HTT polymorphisms and adolescent violence criminal behavior.

  8. Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data.

    Science.gov (United States)

    Zhu, Mingzhu; Dahmen, Jeremy L; Stacey, Gary; Cheng, Jianlin

    2013-09-22

    High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed. We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature. We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.

  9. Coregulation of terpenoid pathway genes and prediction of isoprene production in Bacillus subtilis using transcriptomics

    Energy Technology Data Exchange (ETDEWEB)

    Hess, Becky M.; Xue, Junfeng; Markillie, Lye Meng; Taylor, Ronald C.; Wiley, H. S.; Ahring, Birgitte K.; Linggi, Bryan E.

    2013-06-19

    The isoprenoid pathway converts pyruvate to isoprene and related isoprenoid compounds in plants and some bacteria. Currently, this pathway is of great interest because of the critical role that isoprenoids play in basic cellular processes as well as the industrial value of metabolites such as isoprene. Although the regulation of several pathway genes has been described, there is a paucity of information regarding the system level regulation and control of the pathway. To address this limitation, we examined Bacillus subtilis grown under multiple conditions and then determined the relationship between altered isoprene production and the pattern of gene expression. We found that terpenoid genes appeared to fall into two distinct subsets with opposing correlations with respect to the amount of isoprene produced. The group whose expression levels positively correlated with isoprene production included dxs, the gene responsible for the commitment step in the pathway, as well as ispD, and two genes that participate in the mevalonate pathway, yhfS and pksG. The subset of terpenoid genes that inversely correlated with isoprene production included ispH, ispF, hepS, uppS, ispE, and dxr. A genome wide partial least squares regression model was created to identify other genes or pathways that contribute to isoprene production. This analysis showed that a subset of 213 regulated genes was sufficient to create a predictive model of isoprene production under different conditions and showed correlations at the transcriptional level. We conclude that gene expression levels alone are sufficiently informative about the metabolic state of a cell that produces increased isoprene and can be used to build a model which accurately predicts production of this secondary metabolite across many simulated environmental conditions.

  10. HOX Gene Promoter Prediction and Inter-genomic Comparison: An Evo-Devo Study

    Directory of Open Access Journals (Sweden)

    Marla A. Endriga

    2010-10-01

    Full Text Available Homeobox genes direct the anterior-posterior axis of the body plan in eukaryotic organisms. Promoter regions upstream of the Hox genes jumpstart the transcription process. CpG islands found within the promoter regions can cause silencing of these promoters. The locations of the promoter regions and the CpG islands of Homeo sapiens sapiens (human, Pan troglodytes (chimpanzee, Mus musculus (mouse, and Rattus norvegicus (brown rat are compared and related to the possible influence on the specification of the mammalian body plan. The sequence of each gene in Hox clusters A-D of the mammals considered were retrieved from Ensembl and locations of promoter regions and CpG islands predicted using Exon Finder. The predicted promoter sequences were confirmed via BLAST and verified against the Eukaryotic Promoter Database. The significance of the locations was determined using the Kruskal-Wallis test. Among the four clusters, only promoter locations in cluster B showed significant difference. HOX B genes have been linked with the control of genes that direct the development of axial morphology, particularly of the vertebral column bones. The magnitude of variation among the body plans of closely-related species can thus be partially attributed to the promoter kind, location and number, and gene inactivation via CpG methylation.

  11. Effectiveness of gene expression profiling for response prediction of rectal cancer to preoperative radiotherapy

    International Nuclear Information System (INIS)

    Ojima, Eiki; Inoue, Yasuhiro; Miki, Chikao; Kusunoki, Masato; Mori, Masaki

    2007-01-01

    Our aim was to determine whether the expression levels of specific genes could predict clinical radiosensitivity in human colorectal cancer. Radioresistant colorectal cancer cell lines were established by repeated X-ray exposure (total, 100 Gy), and the gene expressions of the parent and radioresistant cell lines were compared in a microarray analysis. To verify the microarray data, we carried out a reverse transcriptase-polymerase chain reaction analysis of identified genes in clinical samples from 30 irradiated rectal cancer patients. A comparison of the intensity data for the parent and three radioresistant cell lines revealed 17 upregulated and 142 downregulated genes in all radioresistant cell lines. Next, we focused on two upregulated genes, PTMA (prothymosin α) and EIF5a2 (eukaryotic translation initiation factor 5A), in the radioresistant cell lines. In clinical samples, the expression of PTMA was significantly higher in the minor effect group than in the major effect group (P=0.004), but there were no significant differences in EIF5a2 expression between the two groups. We identified radiation-related genes in colorectal cancer and demonstrated that PTMA may play an important role in radiosensitivity. Our findings suggest that PTMA may be a novel marker for predicting the effectiveness of radiotherapy in clinical cases. (author)

  12. Paired hormone response elements predict caveolin-1 as a glucocorticoid target gene.

    Directory of Open Access Journals (Sweden)

    Marinus F van Batenburg

    2010-01-01

    Full Text Available Glucocorticoids act in part via glucocorticoid receptor binding to hormone response elements (HREs, but their direct target genes in vivo are still largely unknown. We developed the criterion that genomic occurrence of paired HREs at an inter-HRE distance less than 200 bp predicts hormone responsiveness, based on synergy of multiple HREs, and HRE information from known target genes. This criterion predicts a substantial number of novel responsive genes, when applied to genomic regions 10 kb upstream of genes. Multiple-tissue in situ hybridization showed that mRNA expression of 6 out of 10 selected genes was induced in a tissue-specific manner in mice treated with a single dose of corticosterone, with the spleen being the most responsive organ. Caveolin-1 was strongly responsive in several organs, and the HRE pair in its upstream region showed increased occupancy by glucocorticoid receptor in response to corticosterone. Our approach allowed for discovery of novel tissue specific glucocorticoid target genes, which may exemplify responses underlying the permissive actions of glucocorticoids.

  13. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    Science.gov (United States)

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of EOperon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.

  14. Intra- and interspecies gene expression models for predicting drug response in canine osteosarcoma.

    Science.gov (United States)

    Fowles, Jared S; Brown, Kristen C; Hess, Ann M; Duval, Dawn L; Gustafson, Daniel L

    2016-02-19

    Genomics-based predictors of drug response have the potential to improve outcomes associated with cancer therapy. Osteosarcoma (OS), the most common primary bone cancer in dogs, is commonly treated with adjuvant doxorubicin or carboplatin following amputation of the affected limb. We evaluated the use of gene-expression based models built in an intra- or interspecies manner to predict chemosensitivity and treatment outcome in canine OS. Models were built and evaluated using microarray gene expression and drug sensitivity data from human and canine cancer cell lines, and canine OS tumor datasets. The "COXEN" method was utilized to filter gene signatures between human and dog datasets based on strong co-expression patterns. Models were built using linear discriminant analysis via the misclassification penalized posterior algorithm. The best doxorubicin model involved genes identified in human lines that were co-expressed and trained on canine OS tumor data, which accurately predicted clinical outcome in 73 % of dogs (p = 0.0262, binomial). The best carboplatin model utilized canine lines for gene identification and model training, with canine OS tumor data for co-expression. Dogs whose treatment matched our predictions had significantly better clinical outcomes than those that didn't (p = 0.0006, Log Rank), and this predictor significantly associated with longer disease free intervals in a Cox multivariate analysis (hazard ratio = 0.3102, p = 0.0124). Our data show that intra- and interspecies gene expression models can successfully predict response in canine OS, which may improve outcome in dogs and serve as pre-clinical validation for similar methods in human cancer research.

  15. Cell-specific prediction and application of drug-induced gene expression profiles.

    Science.gov (United States)

    Hodos, Rachel; Zhang, Ping; Lee, Hao-Chih; Duan, Qiaonan; Wang, Zichen; Clark, Neil R; Ma'ayan, Avi; Wang, Fei; Kidd, Brian; Hu, Jianying; Sontag, David; Dudley, Joel

    2018-01-01

    Gene expression profiling of in vitro drug perturbations is useful for many biomedical discovery applications including drug repurposing and elucidation of drug mechanisms. However, limited data availability across cell types has hindered our capacity to leverage or explore the cell-specificity of these perturbations. While recent efforts have generated a large number of drug perturbation profiles across a variety of human cell types, many gaps remain in this combinatorial drug-cell space. Hence, we asked whether it is possible to fill these gaps by predicting cell-specific drug perturbation profiles using available expression data from related conditions--i.e. from other drugs and cell types. We developed a computational framework that first arranges existing profiles into a three-dimensional array (or tensor) indexed by drugs, genes, and cell types, and then uses either local (nearest-neighbors) or global (tensor completion) information to predict unmeasured profiles. We evaluate prediction accuracy using a variety of metrics, and find that the two methods have complementary performance, each superior in different regions in the drug-cell space. Predictions achieve correlations of 0.68 with true values, and maintain accurate differentially expressed genes (AUC 0.81). Finally, we demonstrate that the predicted profiles add value for making downstream associations with drug targets and therapeutic classes.

  16. In silico prediction of novel therapeutic targets using gene-disease association data.

    Science.gov (United States)

    Ferrero, Enrico; Dunham, Ian; Sanseau, Philippe

    2017-08-29

    Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene-disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets. We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature. Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target

  17. Prediction of metastasis from low-malignant breast cancer by gene expression profiling

    DEFF Research Database (Denmark)

    Thomassen, Mads; Tan, Qihua; Eiriksdottir, Freyja

    2007-01-01

    examined in these studies is the low-risk patients for whom outcome is very difficult to predict with currently used methods. These patients do not receive adjuvant treatment according to the guidelines of the Danish Breast Cancer Cooperative Group (DBCG). In this study, 26 tumors from low-risk patients...... with different characteristics and risk, expression-based classification specifically developed in low-risk patients have higher predictive power in this group.......Promising results for prediction of outcome in breast cancer have been obtained by genome wide gene expression profiling. Some studies have suggested that an extensive overtreatment of breast cancer patients might be reduced by risk assessment with gene expression profiling. A patient group hardly...

  18. Muscle myeloid type I interferon gene expression may predict therapeutic responses to rituximab in myositis patients.

    Science.gov (United States)

    Nagaraju, Kanneboyina; Ghimbovschi, Svetlana; Rayavarapu, Sree; Phadke, Aditi; Rider, Lisa G; Hoffman, Eric P; Miller, Frederick W

    2016-09-01

    To identify muscle gene expression patterns that predict rituximab responses and assess the effects of rituximab on muscle gene expression in PM and DM. In an attempt to understand the molecular mechanism of response and non-response to rituximab therapy, we performed Affymetrix gene expression array analyses on muscle biopsy specimens taken before and after rituximab therapy from eight PM and two DM patients in the Rituximab in Myositis study. We also analysed selected muscle-infiltrating cell phenotypes in these biopsies by immunohistochemical staining. Partek and Ingenuity pathway analyses assessed the gene pathways and networks. Myeloid type I IFN signature genes were expressed at higher levels at baseline in the skeletal muscle of rituximab responders than in non-responders, whereas classic non-myeloid IFN signature genes were expressed at higher levels in non-responders at baseline. Also, rituximab responders have a greater reduction of the myeloid and non-myeloid type I IFN signatures than non-responders. The decrease in the type I IFN signature following administration of rituximab may be associated with the decreases in muscle-infiltrating CD19(+) B cells and CD68(+) macrophages in responders. Our findings suggest that high levels of myeloid type I IFN gene expression in skeletal muscle predict responses to rituximab in PM/DM and that rituximab responders also have a greater decrease in the expression of these genes. These data add further evidence to recent studies defining the type I IFN signature as both a predictor of therapeutic responses and a biomarker of myositis disease activity. Published by Oxford University Press on behalf British Society for Rheumatology 2016. This work is written by US Government employees and is in the public domain in the US.

  19. Predictive gene signatures: molecular markers distinguishing colon adenomatous polyp and carcinoma.

    Directory of Open Access Journals (Sweden)

    Janice E Drew

    Full Text Available Cancers exhibit abnormal molecular signatures associated with disease initiation and progression. Molecular signatures could improve cancer screening, detection, drug development and selection of appropriate drug therapies for individual patients. Typically only very small amounts of tissue are available from patients for analysis and biopsy samples exhibit broad heterogeneity that cannot be captured using a single marker. This report details application of an in-house custom designed GenomeLab System multiplex gene expression assay, the hCellMarkerPlex, to assess predictive gene signatures of normal, adenomatous polyp and carcinoma colon tissue using archived tissue bank material. The hCellMarkerPlex incorporates twenty-one gene markers: epithelial (EZR, KRT18, NOX1, SLC9A2, proliferation (PCNA, CCND1, MS4A12, differentiation (B4GANLT2, CDX1, CDX2, apoptotic (CASP3, NOX1, NTN1, fibroblast (FSP1, COL1A1, structural (ACTG2, CNN1, DES, gene transcription (HDAC1, stem cell (LGR5, endothelial (VWF and mucin production (MUC2. Gene signatures distinguished normal, adenomatous polyp and carcinoma. Individual gene targets significantly contributing to molecular tissue types, classifier genes, were further characterised using real-time PCR, in-situ hybridisation and immunohistochemistry revealing aberrant epithelial expression of MS4A12, LGR5 CDX2, NOX1 and SLC9A2 prior to development of carcinoma. Identified gene signatures identify aberrant epithelial expression of genes prior to cancer development using in-house custom designed gene expression multiplex assays. This approach may be used to assist in objective classification of disease initiation, staging, progression and therapeutic responses using biopsy material.

  20. PRGdb 3.0: a comprehensive platform for prediction and analysis of plant disease resistance genes.

    Science.gov (United States)

    Osuna-Cruz, Cristina M; Paytuvi-Gallart, Andreu; Di Donato, Antimo; Sundesha, Vicky; Andolfo, Giuseppe; Aiese Cigliano, Riccardo; Sanseverino, Walter; Ercolano, Maria R

    2018-01-04

    The Plant Resistance Genes database (PRGdb; http://prgdb.org) has been redesigned with a new user interface, new sections, new tools and new data for genetic improvement, allowing easy access not only to the plant science research community but also to breeders who want to improve plant disease resistance. The home page offers an overview of easy-to-read search boxes that streamline data queries and directly show plant species for which data from candidate or cloned genes have been collected. Bulk data files and curated resistance gene annotations are made available for each plant species hosted. The new Gene Model view offers detailed information on each cloned resistance gene structure to highlight shared attributes with other genes. PRGdb 3.0 offers 153 reference resistance genes and 177 072 annotated candidate Pathogen Receptor Genes (PRGs). Compared to the previous release, the number of putative genes has been increased from 106 to 177 K from 76 sequenced Viridiplantae and algae genomes. The DRAGO 2 tool, which automatically annotates and predicts (PRGs) from DNA and amino acid with high accuracy and sensitivity, has been added. BLAST search has been implemented to offer users the opportunity to annotate and compare their own sequences. The improved section on plant diseases displays useful information linked to genes and genomes to connect complementary data and better address specific needs. Through, a revised and enlarged collection of data, the development of new tools and a renewed portal, PRGdb 3.0 engages the plant science community in developing a consensus plan to improve knowledge and strategies to fight diseases that afflict main crops and other plants. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Protein-Protein Interactions Prediction Based on Iterative Clique Extension with Gene Ontology Filtering

    Directory of Open Access Journals (Sweden)

    Lei Yang

    2014-01-01

    Full Text Available Cliques (maximal complete subnets in protein-protein interaction (PPI network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning.

  2. Predictive gene lists for breast cancer prognosis: A topographic visualisation study

    Directory of Open Access Journals (Sweden)

    Lowe David

    2008-04-01

    Full Text Available Abstract Background The controversy surrounding the non-uniqueness of predictive gene lists (PGL of small selected subsets of genes from very large potential candidates as available in DNA microarray experiments is now widely acknowledged 1. Many of these studies have focused on constructing discriminative semi-parametric models and as such are also subject to the issue of random correlations of sparse model selection in high dimensional spaces. In this work we outline a different approach based around an unsupervised patient-specific nonlinear topographic projection in predictive gene lists. Methods We construct nonlinear topographic projection maps based on inter-patient gene-list relative dissimilarities. The Neuroscale, the Stochastic Neighbor Embedding(SNE and the Locally Linear Embedding(LLE techniques have been used to construct two-dimensional projective visualisation plots of 70 dimensional PGLs per patient, classifiers are also constructed to identify the prognosis indicator of each patient using the resulting projections from those visualisation techniques and investigate whether a-posteriori two prognosis groups are separable on the evidence of the gene lists. A literature-proposed predictive gene list for breast cancer is benchmarked against a separate gene list using the above methods. Generalisation ability is investigated by using the mapping capability of Neuroscale to visualise the follow-up study, but based on the projections derived from the original dataset. Results The results indicate that small subsets of patient-specific PGLs have insufficient prognostic dissimilarity to permit a distinction between two prognosis patients. Uncertainty and diversity across multiple gene expressions prevents unambiguous or even confident patient grouping. Comparative projections across different PGLs provide similar results. Conclusion The random correlation effect to an arbitrary outcome induced by small subset selection from very high

  3. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction

    Directory of Open Access Journals (Sweden)

    Kohlbacher Oliver

    2009-09-01

    Full Text Available Abstract Background Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. Results We extended our previous MultiLoc predictor by incorporating phylogenetic profiles and Gene Ontology terms. Two different datasets were used for training the system, resulting in two versions of this high-accuracy prediction method. One version is specialized for globular proteins and predicts up to five localizations, whereas a second version covers all eleven main eukaryotic subcellular localizations. In a benchmark study with five localizations, MultiLoc2 performs considerably better than other methods for animal and plant proteins and comparably for fungal proteins. Furthermore, MultiLoc2 performs clearly better when using a second dataset that extends the benchmark study to all eleven main eukaryotic subcellular localizations. Conclusion MultiLoc2 is an extensive high-performance subcellular protein localization prediction system. By incorporating phylogenetic profiles and Gene Ontology terms MultiLoc2 yields higher accuracies compared to its previous version. Moreover, it outperforms other prediction systems in two benchmarks studies. MultiLoc2 is available as user-friendly and free web-service, available at: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc2.

  4. Adipose Gene Expression Prior to Weight Loss Can Differentiate and Weakly Predict Dietary Responders

    Science.gov (United States)

    Mutch, David M.; Temanni, M. Ramzi; Henegar, Corneliu; Combes, Florence; Pelloux, Véronique; Holst, Claus; Sørensen, Thorkild I. A.; Astrup, Arne; Martinez, J. Alfredo; Saris, Wim H. M.; Viguerie, Nathalie; Langin, Dominique; Zucker, Jean-Daniel; Clément, Karine

    2007-01-01

    Background The ability to identify obese individuals who will successfully lose weight in response to dietary intervention will revolutionize disease management. Therefore, we asked whether it is possible to identify subjects who will lose weight during dietary intervention using only a single gene expression snapshot. Methodology/Principal Findings The present study involved 54 female subjects from the Nutrient-Gene Interactions in Human Obesity-Implications for Dietary Guidelines (NUGENOB) trial to determine whether subcutaneous adipose tissue gene expression could be used to predict weight loss prior to the 10-week consumption of a low-fat hypocaloric diet. Using several statistical tests revealed that the gene expression profiles of responders (8–12 kgs weight loss) could always be differentiated from non-responders (diet is able to differentiate responders from non-responders as well as serve as a weak predictor of subjects destined to lose weight. While the degree of prediction accuracy currently achieved with a gene expression snapshot is perhaps insufficient for clinical use, this work reveals that the comprehensive molecular signature of adipose tissue paves the way for the future of personalized nutrition. PMID:18094752

  5. CRC-113 gene expression signature for predicting prognosis in patients with colorectal cancer.

    Science.gov (United States)

    Nguyen, Minh Nam; Choi, Tae Gyu; Nguyen, Dinh Truong; Kim, Jin-Hwan; Jo, Yong Hwa; Shahid, Muhammad; Akter, Salima; Aryal, Saurav Nath; Yoo, Ji Youn; Ahn, Yong-Joo; Cho, Kyoung Min; Lee, Ju-Seog; Choe, Wonchae; Kang, Insug; Ha, Joohun; Kim, Sung Soo

    2015-10-13

    Colorectal cancer (CRC) is the third leading cause of global cancer mortality. Recent studies have proposed several gene signatures to predict CRC prognosis, but none of those have proven reliable for predicting prognosis in clinical practice yet due to poor reproducibility and molecular heterogeneity. Here, we have established a prognostic signature of 113 probe sets (CRC-113) that include potential biomarkers and reflect the biological and clinical characteristics. Robustness and accuracy were significantly validated in external data sets from 19 centers in five countries. In multivariate analysis, CRC-113 gene signature showed a stronger prognostic value for survival and disease recurrence in CRC patients than current clinicopathological risk factors and molecular alterations. We also demonstrated that the CRC-113 gene signature reflected both genetic and epigenetic molecular heterogeneity in CRC patients. Furthermore, incorporation of the CRC-113 gene signature into a clinical context and molecular markers further refined the selection of the CRC patients who might benefit from postoperative chemotherapy. Conclusively, CRC-113 gene signature provides new possibilities for improving prognostic models and personalized therapeutic strategies.

  6. The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection.

    Science.gov (United States)

    Tang, Zaixiang; Shen, Yueping; Zhang, Xinyan; Yi, Nengjun

    2017-01-01

    Large-scale "omics" data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, there are considerable challenges in analyzing high-dimensional molecular data, including the large number of potential molecular predictors, limited number of samples, and small effect of each predictor. We propose new Bayesian hierarchical generalized linear models, called spike-and-slab lasso GLMs, for prognostic prediction and detection of associated genes using large-scale molecular data. The proposed model employs a spike-and-slab mixture double-exponential prior for coefficients that can induce weak shrinkage on large coefficients, and strong shrinkage on irrelevant coefficients. We have developed a fast and stable algorithm to fit large-scale hierarchal GLMs by incorporating expectation-maximization (EM) steps into the fast cyclic coordinate descent algorithm. The proposed approach integrates nice features of two popular methods, i.e., penalized lasso and Bayesian spike-and-slab variable selection. The performance of the proposed method is assessed via extensive simulation studies. The results show that the proposed approach can provide not only more accurate estimates of the parameters, but also better prediction. We demonstrate the proposed procedure on two cancer data sets: a well-known breast cancer data set consisting of 295 tumors, and expression data of 4919 genes; and the ovarian cancer data set from TCGA with 362 tumors, and expression data of 5336 genes. Our analyses show that the proposed procedure can generate powerful models for predicting outcomes and detecting associated genes. The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). Copyright © 2017 by the Genetics Society of America.

  7. Genome sequence analysis of predicted polyprenol reductase gene from mangrove plant kandelia obovata

    Science.gov (United States)

    Basyuni, M.; Sagami, H.; Baba, S.; Oku, H.

    2018-03-01

    It has been previously reported that dolichols but not polyprenols were predominated in mangrove leaves and roots. Therefore, the occurrence of larger amounts of dolichol in leaves of mangrove plants implies that polyprenol reductase is responsible for the conversion of polyprenol to dolichol may be active in mangrove leaves. Here we report the early assessment of probably polyprenol reductase gene from genome sequence of mangrove plant Kandelia obovata. The functional assignment of the gene was based on a homology search of the sequences against the non-redundant (nr) peptide database of NCBI using Blastx. The degree of sequence identity between DNA sequence and known polyprenol reductase was confirmed using the Blastx probability E-value, total score, and identity. The genome sequence data resulted in three partial sequences, termed c23157 (700 bp), c23901 (960 bp), and c24171 (531 bp). The c23157 gene showed the highest similarity (61%) to predicted polyprenol reductase 2- like from Gossypium raimondii with E-value 2e-100. The second gene was c23901 to exhibit high similarity (78%) to the steroid 5-alpha-reductase Det2 from J. curcas with E-value 2e-140. Furthermore, the c24171 gene depicted highest similarity (79%) to the polyprenol reductase 2 isoform X1 from Jatropha curcas with E- value 7e-21.The present study suggested that the c23157, c23901, and c24171, genes may encode predicted polyprenol reductase. The c23157, c23901, c24171 are therefore the new type of predicted polyprenol reductase from K. obovata.

  8. Using gene co-expression network analysis to predict biomarkers for chronic lymphocytic leukemia

    Directory of Open Access Journals (Sweden)

    Borlawsky Tara B

    2010-10-01

    Full Text Available Abstract Background Chronic lymphocytic leukemia (CLL is the most common adult leukemia. It is a highly heterogeneous disease, and can be divided roughly into indolent and progressive stages based on classic clinical markers. Immunoglobin heavy chain variable region (IgVH mutational status was found to be associated with patient survival outcome, and biomarkers linked to the IgVH status has been a focus in the CLL prognosis research field. However, biomarkers highly correlated with IgVH mutational status which can accurately predict the survival outcome are yet to be discovered. Results In this paper, we investigate the use of gene co-expression network analysis to identify potential biomarkers for CLL. Specifically we focused on the co-expression network involving ZAP70, a well characterized biomarker for CLL. We selected 23 microarray datasets corresponding to multiple types of cancer from the Gene Expression Omnibus (GEO and used the frequent network mining algorithm CODENSE to identify highly connected gene co-expression networks spanning the entire genome, then evaluated the genes in the co-expression network in which ZAP70 is involved. We then applied a set of feature selection methods to further select genes which are capable of predicting IgVH mutation status from the ZAP70 co-expression network. Conclusions We have identified a set of genes that are potential CLL prognostic biomarkers IL2RB, CD8A, CD247, LAG3 and KLRK1, which can predict CLL patient IgVH mutational status with high accuracies. Their prognostic capabilities were cross-validated by applying these biomarker candidates to classify patients into different outcome groups using a CLL microarray datasets with clinical information.

  9. Methylation of cancer-stem-cell-associated Wnt target genes predicts poor prognosis in colorectal cancer patients

    NARCIS (Netherlands)

    de Sousa E Melo, Felipe; Colak, Selcuk; Buikhuisen, Joyce; Koster, Jan; Cameron, Kate; de Jong, Joan H.; Tuynman, Jurriaan B.; Prasetyanti, Pramudita R.; Fessler, Evelyn; van den Bergh, Saskia P.; Rodermond, Hans; Dekker, Evelien; van der Loos, Chris M.; Pals, Steven T.; van de Vijver, Marc J.; Versteeg, Rogier; Richel, Dick J.; Vermeulen, Louis; Medema, Jan Paul

    2011-01-01

    Gene signatures derived from cancer stem cells (CSCs) predict tumor recurrence for many forms of cancer. Here, we derived a gene signature for colorectal CSCs defined by high Wnt signaling activity, which in agreement with previous observations predicts poor prognosis. Surprisingly, however, we

  10. Integrative Analysis of Gene Expression Data Including an Assessment of Pathway Enrichment for Predicting Prostate Cancer

    Directory of Open Access Journals (Sweden)

    Pingzhao Hu

    2006-01-01

    biological pathways. In particular, we observed that by integrating information from the insulin signalling pathway into our prediction model, we achieved better prediction of prostate cancer. Conclusions: Our data integration methodology provides an efficient way to identify biologically sound and statistically significant pathways from gene expression data. The significant gene expression phenotypes identified in our study have the potential to characterize complex genetic alterations in prostate cancer.

  11. A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data

    Directory of Open Access Journals (Sweden)

    Ruzzo Walter L

    2006-03-01

    Full Text Available Abstract Background As a variety of functional genomic and proteomic techniques become available, there is an increasing need for functional analysis methodologies that integrate heterogeneous data sources. Methods In this paper, we address this issue by proposing a general framework for gene function prediction based on the k-nearest-neighbor (KNN algorithm. The choice of KNN is motivated by its simplicity, flexibility to incorporate different data types and adaptability to irregular feature spaces. A weakness of traditional KNN methods, especially when handling heterogeneous data, is that performance is subject to the often ad hoc choice of similarity metric. To address this weakness, we apply regression methods to infer a similarity metric as a weighted combination of a set of base similarity measures, which helps to locate the neighbors that are most likely to be in the same class as the target gene. We also suggest a novel voting scheme to generate confidence scores that estimate the accuracy of predictions. The method gracefully extends to multi-way classification problems. Results We apply this technique to gene function prediction according to three well-known Escherichia coli classification schemes suggested by biologists, using information derived from microarray and genome sequencing data. We demonstrate that our algorithm dramatically outperforms the naive KNN methods and is competitive with support vector machine (SVM algorithms for integrating heterogenous data. We also show that by combining different data sources, prediction accuracy can improve significantly. Conclusion Our extension of KNN with automatic feature weighting, multi-class prediction, and probabilistic inference, enhance prediction accuracy significantly while remaining efficient, intuitive and flexible. This general framework can also be applied to similar classification problems involving heterogeneous datasets.

  12. Predictive models for mutations in mismatch repair genes: implication for genetic counseling in developing countries

    Directory of Open Access Journals (Sweden)

    Monteiro Santos Erika

    2012-02-01

    Full Text Available Abstract Background Lynch syndrome (LS is the most common form of inherited predisposition to colorectal cancer (CRC, accounting for 2-5% of all CRC. LS is an autosomal dominant disease characterized by mutations in the mismatch repair genes mutL homolog 1 (MLH1, mutS homolog 2 (MSH2, postmeiotic segregation increased 1 (PMS1, post-meiotic segregation increased 2 (PMS2 and mutS homolog 6 (MSH6. Mutation risk prediction models can be incorporated into clinical practice, facilitating the decision-making process and identifying individuals for molecular investigation. This is extremely important in countries with limited economic resources. This study aims to evaluate sensitivity and specificity of five predictive models for germline mutations in repair genes in a sample of individuals with suspected Lynch syndrome. Methods Blood samples from 88 patients were analyzed through sequencing MLH1, MSH2 and MSH6 genes. The probability of detecting a mutation was calculated using the PREMM, Barnetson, MMRpro, Wijnen and Myriad models. To evaluate the sensitivity and specificity of the models, receiver operating characteristic curves were constructed. Results Of the 88 patients included in this analysis, 31 mutations were identified: 16 were found in the MSH2 gene, 15 in the MLH1 gene and no pathogenic mutations were identified in the MSH6 gene. It was observed that the AUC for the PREMM (0.846, Barnetson (0.850, MMRpro (0.821 and Wijnen (0.807 models did not present significant statistical difference. The Myriad model presented lower AUC (0.704 than the four other models evaluated. Considering thresholds of ≥ 5%, the models sensitivity varied between 1 (Myriad and 0.87 (Wijnen and specificity ranged from 0 (Myriad to 0.38 (Barnetson. Conclusions The Barnetson, PREMM, MMRpro and Wijnen models present similar AUC. The AUC of the Myriad model is statistically inferior to the four other models.

  13. Predictive models for mutations in mismatch repair genes: implication for genetic counseling in developing countries

    Energy Technology Data Exchange (ETDEWEB)

    Monteiro Santos, Erika Maria [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); International Center of Research and Training (CIPE), AC Camargo Hospital, Sao Paulo (Brazil); Silva Junior, Wilson Araujo da [Sao Paulo University, Department of Genetics, Medical School of Ribeirao Preto, Ribeirao Preto (Brazil); Carraro, Dirce Maria [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); International Center of Research and Training (CIPE), AC Camargo Hospital, Sao Paulo (Brazil); Rossi, Benedito Mauro; Valentin, Mev Dominguez [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); Carneiro, Felipe [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); International Center of Research and Training (CIPE), AC Camargo Hospital, Sao Paulo (Brazil); Oliveira, Ligia Petrolini de [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); Oliveira Ferreira, Fabio de; Junior, Samuel Aguiar [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); Hereditary Colorectal Cancer Registry, AC Camargo Hospital, Sao Paulo (Brazil); Nakagawa, Wilson Toshihiko [Hereditary Colorectal Cancer Registry, AC Camargo Hospital, Sao Paulo (Brazil); Gomy, Israel [Graduation Program, AC Camargo Hospital, Sao Paulo (Brazil); Sao Paulo University, Department of Genetics, Medical School of Ribeirao Preto, Ribeirao Preto (Brazil); Faria Ferraz, Victor Evangelista de [Sao Paulo University, Department of Genetics, Medical School of Ribeirao Preto, Ribeirao Preto (Brazil)

    2012-02-09

    Lynch syndrome (LS) is the most common form of inherited predisposition to colorectal cancer (CRC), accounting for 2-5% of all CRC. LS is an autosomal dominant disease characterized by mutations in the mismatch repair genes mutL homolog 1 (MLH1), mutS homolog 2 (MSH2), postmeiotic segregation increased 1 (PMS1), post-meiotic segregation increased 2 (PMS2) and mutS homolog 6 (MSH6). Mutation risk prediction models can be incorporated into clinical practice, facilitating the decision-making process and identifying individuals for molecular investigation. This is extremely important in countries with limited economic resources. This study aims to evaluate sensitivity and specificity of five predictive models for germline mutations in repair genes in a sample of individuals with suspected Lynch syndrome. Blood samples from 88 patients were analyzed through sequencing MLH1, MSH2 and MSH6 genes. The probability of detecting a mutation was calculated using the PREMM, Barnetson, MMRpro, Wijnen and Myriad models. To evaluate the sensitivity and specificity of the models, receiver operating characteristic curves were constructed. Of the 88 patients included in this analysis, 31 mutations were identified: 16 were found in the MSH2 gene, 15 in the MLH1 gene and no pathogenic mutations were identified in the MSH6 gene. It was observed that the AUC for the PREMM (0.846), Barnetson (0.850), MMRpro (0.821) and Wijnen (0.807) models did not present significant statistical difference. The Myriad model presented lower AUC (0.704) than the four other models evaluated. Considering thresholds of ≥ 5%, the models sensitivity varied between 1 (Myriad) and 0.87 (Wijnen) and specificity ranged from 0 (Myriad) to 0.38 (Barnetson). The Barnetson, PREMM, MMRpro and Wijnen models present similar AUC. The AUC of the Myriad model is statistically inferior to the four other models.

  14. Predictive models for mutations in mismatch repair genes: implication for genetic counseling in developing countries

    International Nuclear Information System (INIS)

    Monteiro Santos, Erika Maria; Silva Junior, Wilson Araujo da; Carraro, Dirce Maria; Rossi, Benedito Mauro; Valentin, Mev Dominguez; Carneiro, Felipe; Oliveira, Ligia Petrolini de; Oliveira Ferreira, Fabio de; Junior, Samuel Aguiar; Nakagawa, Wilson Toshihiko; Gomy, Israel; Faria Ferraz, Victor Evangelista de

    2012-01-01

    Lynch syndrome (LS) is the most common form of inherited predisposition to colorectal cancer (CRC), accounting for 2-5% of all CRC. LS is an autosomal dominant disease characterized by mutations in the mismatch repair genes mutL homolog 1 (MLH1), mutS homolog 2 (MSH2), postmeiotic segregation increased 1 (PMS1), post-meiotic segregation increased 2 (PMS2) and mutS homolog 6 (MSH6). Mutation risk prediction models can be incorporated into clinical practice, facilitating the decision-making process and identifying individuals for molecular investigation. This is extremely important in countries with limited economic resources. This study aims to evaluate sensitivity and specificity of five predictive models for germline mutations in repair genes in a sample of individuals with suspected Lynch syndrome. Blood samples from 88 patients were analyzed through sequencing MLH1, MSH2 and MSH6 genes. The probability of detecting a mutation was calculated using the PREMM, Barnetson, MMRpro, Wijnen and Myriad models. To evaluate the sensitivity and specificity of the models, receiver operating characteristic curves were constructed. Of the 88 patients included in this analysis, 31 mutations were identified: 16 were found in the MSH2 gene, 15 in the MLH1 gene and no pathogenic mutations were identified in the MSH6 gene. It was observed that the AUC for the PREMM (0.846), Barnetson (0.850), MMRpro (0.821) and Wijnen (0.807) models did not present significant statistical difference. The Myriad model presented lower AUC (0.704) than the four other models evaluated. Considering thresholds of ≥ 5%, the models sensitivity varied between 1 (Myriad) and 0.87 (Wijnen) and specificity ranged from 0 (Myriad) to 0.38 (Barnetson). The Barnetson, PREMM, MMRpro and Wijnen models present similar AUC. The AUC of the Myriad model is statistically inferior to the four other models

  15. A postprocessing method in the HMC framework for predicting gene function based on biological instrumental data

    Science.gov (United States)

    Feng, Shou; Fu, Ping; Zheng, Wenbin

    2018-03-01

    Predicting gene function based on biological instrumental data is a complicated and challenging hierarchical multi-label classification (HMC) problem. When using local approach methods to solve this problem, a preliminary results processing method is usually needed. This paper proposed a novel preliminary results processing method called the nodes interaction method. The nodes interaction method revises the preliminary results and guarantees that the predictions are consistent with the hierarchy constraint. This method exploits the label dependency and considers the hierarchical interaction between nodes when making decisions based on the Bayesian network in its first phase. In the second phase, this method further adjusts the results according to the hierarchy constraint. Implementing the nodes interaction method in the HMC framework also enhances the HMC performance for solving the gene function prediction problem based on the Gene Ontology (GO), the hierarchy of which is a directed acyclic graph that is more difficult to tackle. The experimental results validate the promising performance of the proposed method compared to state-of-the-art methods on eight benchmark yeast data sets annotated by the GO.

  16. An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.

    Science.gov (United States)

    Nidheesh, N; Abdul Nazeer, K A; Ameer, P M

    2017-12-01

    Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Semi-supervised prediction of gene regulatory networks using machine learning algorithms.

    Science.gov (United States)

    Patel, Nihir; Wang, Jason T L

    2015-10-01

    Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.

  18. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data.

    Science.gov (United States)

    Hawkins, Troy; Chitale, Meghana; Luban, Stanislav; Kihara, Daisuke

    2009-02-15

    Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http

  19. Expression of estrogen-related gene markers in breast cancer tissue predicts aromatase inhibitor responsiveness.

    Directory of Open Access Journals (Sweden)

    Irene Moy

    Full Text Available Aromatase inhibitors (AIs are the most effective class of drugs in the endocrine treatment of breast cancer, with an approximate 50% treatment response rate. Our objective was to determine whether intratumoral expression levels of estrogen-related genes are predictive of AI responsiveness in postmenopausal women with breast cancer. Primary breast carcinomas were obtained from 112 women who received AI therapy after failing adjuvant tamoxifen therapy and developing recurrent breast cancer. Tumor ERα and PR protein expression were analyzed by immunohistochemistry (IHC. Messenger RNA (mRNA levels of 5 estrogen-related genes-AKR1C3, aromatase, ERα, and 2 estradiol/ERα target genes, BRCA1 and PR-were measured by real-time PCR. Tumor protein and mRNA levels were compared with breast cancer progression rates to determine predictive accuracy. Responsiveness to AI therapy-defined as the combined complete response, partial response, and stable disease rates for at least 6 months-was 51%; rates were 56% in ERα-IHC-positive and 14% in ERα-IHC-negative tumors. Levels of ERα, PR, or BRCA1 mRNA were independently predictive for responsiveness to AI. In cross-validated analyses, a combined measurement of tumor ERα and PR mRNA levels yielded a more superior specificity (36% and identical sensitivity (96% to the current clinical practice (ERα/PR-IHC. In patients with ERα/PR-IHC-negative tumors, analysis of mRNA expression revealed either non-significant trends or statistically significant positive predictive values for AI responsiveness. In conclusion, expression levels of estrogen-related mRNAs are predictive for AI responsiveness in postmenopausal women with breast cancer, and mRNA expression analysis may improve patient selection.

  20. Prediction of novel target genes and pathways involved in irinotecan-resistant colorectal cancer.

    Directory of Open Access Journals (Sweden)

    Precious Takondwa Makondi

    Full Text Available Acquired drug resistance to the chemotherapeutic drug irinotecan (the active metabolite of which is SN-38 is one of the significant obstacles in the treatment of advanced colorectal cancer (CRC. The molecular mechanism or targets mediating irinotecan resistance are still unclear. It is urgent to find the irinotecan response biomarkers to improve CRC patients' therapy.Genetic Omnibus Database GSE42387 which contained the gene expression profiles of parental and irinotecan-resistant HCT-116 cell lines was used. Differentially expressed genes (DEGs between parental and irinotecan-resistant cells, protein-protein interactions (PPIs, gene ontologies (GOs and pathway analysis were performed to identify the overall biological changes. The most common DEGs in the PPIs, GOs and pathways were identified and were validated clinically by their ability to predict overall survival and disease free survival. The gene-gene expression correlation and gene-resistance correlation was also evaluated in CRC patients using The Cancer Genomic Atlas data (TCGA.The 135 DEGs were identified of which 36 were upregulated and 99 were down regulated. After mapping the PPI networks, the GOs and the pathways, nine genes (GNAS, PRKACB, MECOM, PLA2G4C, BMP6, BDNF, DLG4, FGF2 and FGF9 were found to be commonly enriched. Signal transduction was the most significant GO and MAPK pathway was the most significant pathway. The five genes (FGF2, FGF9, PRKACB, MECOM and PLA2G4C in the MAPK pathway were all contained in the signal transduction and the levels of those genes were upregulated. The FGF2, FGF9 and MECOM expression were highly associated with CRC patients' survival rate but not PRKACB and PLA2G4C. In addition, FGF9 was also associated with irinotecan resistance and poor disease free survival. FGF2, FGF9 and PRKACB were positively correlated with each other while MECOM correlated positively with FGF9 and PLA2G4C, and correlated negatively with FGF2 and PRKACB after doing gene-gene

  1. Computational prediction of miRNA genes from small RNA sequencing data

    Directory of Open Access Journals (Sweden)

    Wenjing eKang

    2015-01-01

    Full Text Available Next-generation sequencing now for the first time allows researchers to gauge the depth and variation of entire transcriptomes. However, now as rare transcripts can be detected that are present in cells at single copies, more advanced computational tools are needed to accurately annotate and profile them. miRNAs are 22 nucleotide small RNAs (sRNAs that post-transcriptionally reduce the output of protein coding genes. They have established roles in numerous biological processes, including cancers and other diseases. During miRNA biogenesis, the sRNAs are sequentially cleaved from precursor molecules that have a characteristic hairpin RNA structure. The vast majority of new miRNA genes that are discovered are mined from small RNA sequencing (sRNA-seq, which can detect more than a billion RNAs in a single run. However, given that many of the detected RNAs are degradation products from all types of transcripts, the accurate identification of miRNAs remain a non-trivial computational problem. Here we review the tools available to predict animal miRNAs from sRNA sequencing data. We present tools for generalist and specialist use cases, including prediction from massively pooled data or in species without reference genome. We also present wet-lab methods used to validate predicted miRNAs, and approaches to computationally benchmark prediction accuracy. For each tool, we reference validation experiments and benchmarking efforts. Last, we discuss the future of the field.

  2. Establishment of a 12-gene expression signature to predict colon cancer prognosis

    Directory of Open Access Journals (Sweden)

    Dalong Sun

    2018-06-01

    Full Text Available A robust and accurate gene expression signature is essential to assist oncologists to determine which subset of patients at similar Tumor-Lymph Node-Metastasis (TNM stage has high recurrence risk and could benefit from adjuvant therapies. Here we applied a two-step supervised machine-learning method and established a 12-gene expression signature to precisely predict colon adenocarcinoma (COAD prognosis by using COAD RNA-seq transcriptome data from The Cancer Genome Atlas (TCGA. The predictive performance of the 12-gene signature was validated with two independent gene expression microarray datasets: GSE39582 includes 566 COAD cases for the development of six molecular subtypes with distinct clinical, molecular and survival characteristics; GSE17538 is a dataset containing 232 colon cancer patients for the generation of a metastasis gene expression profile to predict recurrence and death in COAD patients. The signature could effectively separate the poor prognosis patients from good prognosis group (disease specific survival (DSS: Kaplan Meier (KM Log Rank p = 0.0034; overall survival (OS: KM Log Rank p = 0.0336 in GSE17538. For patients with proficient mismatch repair system (pMMR in GSE39582, the signature could also effectively distinguish high risk group from low risk group (OS: KM Log Rank p = 0.005; Relapse free survival (RFS: KM Log Rank p = 0.022. Interestingly, advanced stage patients were significantly enriched in high 12-gene score group (Fisher’s exact test p = 0.0003. After stage stratification, the signature could still distinguish poor prognosis patients in GSE17538 from good prognosis within stage II (Log Rank p = 0.01 and stage II & III (Log Rank p = 0.017 in the outcome of DFS. Within stage III or II/III pMMR patients treated with Adjuvant Chemotherapies (ACT and patients with higher 12-gene score showed poorer prognosis (III, OS: KM Log Rank p = 0.046; III & II, OS: KM Log Rank p = 0.041. Among stage II/III pMMR patients

  3. A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress

    Directory of Open Access Journals (Sweden)

    Ching-Hsue Cheng

    2018-01-01

    Full Text Available The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i the proposed model is different from the previous models lacking the concept of time series; (ii the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies.

  4. A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress

    Science.gov (United States)

    2018-01-01

    The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i) the proposed model is different from the previous models lacking the concept of time series; (ii) the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii) the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies. PMID:29765399

  5. Predicting gene function using hierarchical multi-label decision tree ensembles

    Directory of Open Access Journals (Sweden)

    Kocev Dragi

    2010-01-01

    Full Text Available Abstract Background S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. Results We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO. We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. Conclusions Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.

  6. A Seasonal Time-Series Model Based on Gene Expression Programming for Predicting Financial Distress.

    Science.gov (United States)

    Cheng, Ching-Hsue; Chan, Chia-Pang; Yang, Jun-He

    2018-01-01

    The issue of financial distress prediction plays an important and challenging research topic in the financial field. Currently, there have been many methods for predicting firm bankruptcy and financial crisis, including the artificial intelligence and the traditional statistical methods, and the past studies have shown that the prediction result of the artificial intelligence method is better than the traditional statistical method. Financial statements are quarterly reports; hence, the financial crisis of companies is seasonal time-series data, and the attribute data affecting the financial distress of companies is nonlinear and nonstationary time-series data with fluctuations. Therefore, this study employed the nonlinear attribute selection method to build a nonlinear financial distress prediction model: that is, this paper proposed a novel seasonal time-series gene expression programming model for predicting the financial distress of companies. The proposed model has several advantages including the following: (i) the proposed model is different from the previous models lacking the concept of time series; (ii) the proposed integrated attribute selection method can find the core attributes and reduce high dimensional data; and (iii) the proposed model can generate the rules and mathematical formulas of financial distress for providing references to the investors and decision makers. The result shows that the proposed method is better than the listing classifiers under three criteria; hence, the proposed model has competitive advantages in predicting the financial distress of companies.

  7. False positive reduction in protein-protein interaction predictions using gene ontology annotations

    Directory of Open Access Journals (Sweden)

    Lin Yen-Han

    2007-07-01

    Full Text Available Abstract Background Many crucial cellular operations such as metabolism, signalling, and regulations are based on protein-protein interactions. However, the lack of robust protein-protein interaction information is a challenge. One reason for the lack of solid protein-protein interaction information is poor agreement between experimental findings and computational sets that, in turn, comes from huge false positive predictions in computational approaches. Reduction of false positive predictions and enhancing true positive fraction of computationally predicted protein-protein interaction datasets based on highly confident experimental results has not been adequately investigated. Results Gene Ontology (GO annotations were used to reduce false positive protein-protein interactions (PPI pairs resulting from computational predictions. Using experimentally obtained PPI pairs as a training dataset, eight top-ranking keywords were extracted from GO molecular function annotations. The sensitivity of these keywords is 64.21% in the yeast experimental dataset and 80.83% in the worm experimental dataset. The specificities, a measure of recovery power, of these keywords applied to four predicted PPI datasets for each studied organisms, are 48.32% and 46.49% (by average of four datasets in yeast and worm, respectively. Based on eight top-ranking keywords and co-localization of interacting proteins a set of two knowledge rules were deduced and applied to remove false positive protein pairs. The 'strength', a measure of improvement provided by the rules was defined based on the signal-to-noise ratio and implemented to measure the applicability of knowledge rules applying to the predicted PPI datasets. Depending on the employed PPI-predicting methods, the strength varies between two and ten-fold of randomly removing protein pairs from the datasets. Conclusion Gene Ontology annotations along with the deduced knowledge rules could be implemented to partially

  8. A Computational Gene Expression Score for Predicting Immune Injury in Renal Allografts.

    Directory of Open Access Journals (Sweden)

    Tara K Sigdel

    Full Text Available Whole genome microarray meta-analyses of 1030 kidney, heart, lung and liver allograft biopsies identified a common immune response module (CRM of 11 genes that define acute rejection (AR across different engrafted tissues. We evaluated if the CRM genes can provide a molecular microscope to quantify graft injury in acute rejection (AR and predict risk of progressive interstitial fibrosis and tubular atrophy (IFTA in histologically normal kidney biopsies.Computational modeling was done on tissue qPCR based gene expression measurements for the 11 CRM genes in 146 independent renal allografts from 122 unique patients with AR (n = 54 and no-AR (n = 92. 24 demographically matched patients with no-AR had 6 and 24 month paired protocol biopsies; all had histologically normal 6 month biopsies, and 12 had evidence of progressive IFTA (pIFTA on their 24 month biopsies. Results were correlated with demographic, clinical and pathology variables.The 11 gene qPCR based tissue CRM score (tCRM was significantly increased in AR (5.68 ± 0.91 when compared to STA (1.29 ± 0.28; p < 0.001 and pIFTA (7.94 ± 2.278 versus 2.28 ± 0.66; p = 0.04, with greatest significance for CXCL9 and CXCL10 in AR (p <0.001 and CD6 (p<0.01, CXCL9 (p<0.05, and LCK (p<0.01 in pIFTA. tCRM was a significant independent correlate of biopsy confirmed AR (p < 0.001; AUC of 0.900; 95% CI = 0.705-903. Gene expression modeling of 6 month biopsies across 7/11 genes (CD6, INPP5D, ISG20, NKG7, PSMB9, RUNX3, and TAP1 significantly (p = 0.037 predicted the development of pIFTA at 24 months.Genome-wide tissue gene expression data mining has supported the development of a tCRM-qPCR based assay for evaluating graft immune inflammation. The tCRM score quantifies injury in AR and stratifies patients at increased risk of future pIFTA prior to any perturbation of graft function or histology.

  9. Predicting acute cardiac rejection from donor heart and pre-transplant recipient blood gene expression.

    Science.gov (United States)

    Hollander, Zsuzsanna; Chen, Virginia; Sidhu, Keerat; Lin, David; Ng, Raymond T; Balshaw, Robert; Cohen-Freue, Gabriela V; Ignaszewski, Andrew; Imai, Carol; Kaan, Annemarie; Tebbutt, Scott J; Wilson-McManus, Janet E; McMaster, Robert W; Keown, Paul A; McManus, Bruce M

    2013-02-01

    Acute rejection in cardiac transplant patients remains a contributory factor to limited survival of implanted hearts. Currently, there are no biomarkers in clinical use that can predict, at the time of transplantation, the likelihood of post-transplant acute cellular rejection. Such a development would be of great value in personalizing immunosuppressive treatment. Recipient age, donor age, cold ischemic time, warm ischemic time, panel-reactive antibody, gender mismatch, blood type mismatch and human leukocyte antigens (HLA-A, -B and -DR) mismatch between recipients and donors were tested in 53 heart transplant patients for their power to predict post-transplant acute cellular rejection. Donor transplant biopsy and recipient pre-transplant blood were also examined for the presence of genomic biomarkers in 7 rejection and 11 non-rejection patients, using non-targeted data mining techniques. The biomarker based on the 8 clinical variables had an area under the receiver operating characteristic curve (AUC) of 0.53. The pre-transplant recipient blood gene-based panel did not yield better performance, but the donor heart tissue gene-based panel had an AUC = 0.78. A combination of 25 probe sets from the transplant donor biopsy and 18 probe sets from the pre-transplant recipient whole blood had an AUC = 0.90. Biologic pathways implicated include VEGF- and EGFR-signaling, and MAPK. Based on this study, the best predictive biomarker panel contains genes from recipient whole blood and donor myocardial tissue. This panel provides clinically relevant prediction power and, if validated, may personalize immunosuppressive treatment and rejection monitoring. Copyright © 2013 International Society for Heart and Lung Transplantation. Published by Elsevier Inc. All rights reserved.

  10. Comparative analysis of codon usage patterns and identification of predicted highly expressed genes in five Salmonella genomes

    Directory of Open Access Journals (Sweden)

    Mondal U

    2008-01-01

    Full Text Available Purpose: To anlyse codon usage patterns of five complete genomes of Salmonella , predict highly expressed genes, examine horizontally transferred pathogenicity-related genes to detect their presence in the strains, and scrutinize the nature of highly expressed genes to infer upon their lifestyle. Methods: Protein coding genes, ribosomal protein genes, and pathogenicity-related genes were analysed with Codon W and CAI (codon adaptation index Calculator. Results: Translational efficiency plays a role in codon usage variation in Salmonella genes. Low bias was noticed in most of the genes. GC3 (guanine cytosine at third position composition does not influence codon usage variation in the genes of these Salmonella strains. Among the cluster of orthologous groups (COGs, translation, ribosomal structure biogenesis [J], and energy production and conversion [C] contained the highest number of potentially highly expressed (PHX genes. Correspondence analysis reveals the conserved nature of the genes. Highly expressed genes were detected. Conclusions: Selection for translational efficiency is the major source of variation of codon usage in the genes of Salmonella . Evolution of pathogenicity-related genes as a unit suggests their ability to infect and exist as a pathogen. Presence of a lot of PHX genes in the information and storage-processing category of COGs indicated their lifestyle and revealed that they were not subjected to genome reduction.

  11. CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation.

    Science.gov (United States)

    Nikulova, Anna A; Favorov, Alexander V; Sutormin, Roman A; Makeev, Vsevolod J; Mironov, Andrey A

    2012-07-01

    Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.

  12. Pancreatic cancer circulating tumour cells express a cell motility gene signature that predicts survival after surgery

    International Nuclear Information System (INIS)

    Sergeant, Gregory; Eijsden, Rudy van; Roskams, Tania; Van Duppen, Victor; Topal, Baki

    2012-01-01

    (95% CI) = 1.366 (1.004 – 1.861)). Pancreatic CTC isolated from blood samples using FACS-based negative depletion, express a cell motility gene signature. Expression of this newly defined cell motility gene signature in the primary tumour can predict survival of patients undergoing surgical resection for pancreatic cancer. Clinical trials.gov NCT00495924

  13. A hemocyte gene expression signature correlated with predictive capacity of oysters to survive Vibrio infections

    Directory of Open Access Journals (Sweden)

    Rosa Rafael

    2012-06-01

    Full Text Available Abstract Background The complex balance between environmental and host factors is an important determinant of susceptibility to infection. Disturbances of this equilibrium may result in multifactorial diseases as illustrated by the summer mortality syndrome, a worldwide and complex phenomenon that affects the oysters, Crassostrea gigas. The summer mortality syndrome reveals a physiological intolerance making this oyster species susceptible to diseases. Exploration of genetic basis governing the oyster resistance or susceptibility to infections is thus a major goal for understanding field mortality events. In this context, we used high-throughput genomic approaches to identify genetic traits that may characterize inherent survival capacities in C. gigas. Results Using digital gene expression (DGE, we analyzed the transcriptomes of hemocytes (immunocompetent cells of oysters able or not able to survive infections by Vibrio species shown to be involved in summer mortalities. Hemocytes were nonlethally collected from oysters before Vibrio experimental infection, and two DGE libraries were generated from individuals that survived or did not survive. Exploration of DGE data and microfluidic qPCR analyses at individual level showed an extraordinary polymorphism in gene expressions, but also a set of hemocyte-expressed genes whose basal mRNA levels discriminate oyster capacity to survive infections by the pathogenic V. splendidus LGP32. Finally, we identified a signature of 14 genes that predicted oyster survival capacity. Their expressions are likely driven by distinct transcriptional regulation processes associated or not associated to gene copy number variation (CNV. Conclusions We provide here for the first time in oyster a gene expression survival signature that represents a useful tool for understanding mortality events and for assessing genetic traits of interest for disease resistance selection programs.

  14. Reduced Set of Virulence Genes Allows High Accuracy Prediction of Bacterial Pathogenicity in Humans

    Science.gov (United States)

    Iraola, Gregorio; Vazquez, Gustavo; Spangenberg, Lucía; Naya, Hugo

    2012-01-01

    Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of different virulence-related genes among more than finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes () is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at ), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions. PMID:22916122

  15. A six-gene signature predicts survival of patients with localized pancreatic ductal adenocarcinoma.

    Directory of Open Access Journals (Sweden)

    Jeran K Stratford

    2010-07-01

    Full Text Available Pancreatic ductal adenocarcinoma (PDAC remains a lethal disease. For patients with localized PDAC, surgery is the best option, but with a median survival of less than 2 years and a difficult and prolonged postoperative course for most, there is an urgent need to better identify patients who have the most aggressive disease.We analyzed the gene expression profiles of primary tumors from patients with localized compared to metastatic disease and identified a six-gene signature associated with metastatic disease. We evaluated the prognostic potential of this signature in a training set of 34 patients with localized and resected PDAC and selected a cut-point associated with outcome using X-tile. We then applied this cut-point to an independent test set of 67 patients with localized and resected PDAC and found that our signature was independently predictive of survival and superior to established clinical prognostic factors such as grade, tumor size, and nodal status, with a hazard ratio of 4.1 (95% confidence interval [CI] 1.7-10.0. Patients defined to be high-risk patients by the six-gene signature had a 1-year survival rate of 55% compared to 91% in the low-risk group.Our six-gene signature may be used to better stage PDAC patients and assist in the difficult treatment decisions of surgery and to select patients whose tumor biology may benefit most from neoadjuvant therapy. The use of this six-gene signature should be investigated in prospective patient cohorts, and if confirmed, in future PDAC clinical trials, its potential as a biomarker should be investigated. Genes in this signature, or the pathways that they fall into, may represent new therapeutic targets. Please see later in the article for the Editors' Summary.

  16. Gene network inherent in genomic big data improves the accuracy of prognostic prediction for cancer patients.

    Science.gov (United States)

    Kim, Yun Hak; Jeong, Dae Cheon; Pak, Kyoungjune; Goh, Tae Sik; Lee, Chi-Seung; Han, Myoung-Eun; Kim, Ji-Young; Liangwen, Liu; Kim, Chi Dae; Jang, Jeon Yeob; Cha, Wonjae; Oh, Sae-Ock

    2017-09-29

    Accurate prediction of prognosis is critical for therapeutic decisions regarding cancer patients. Many previously developed prognostic scoring systems have limitations in reflecting recent progress in the field of cancer biology such as microarray, next-generation sequencing, and signaling pathways. To develop a new prognostic scoring system for cancer patients, we used mRNA expression and clinical data in various independent breast cancer cohorts (n=1214) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO). A new prognostic score that reflects gene network inherent in genomic big data was calculated using Network-Regularized high-dimensional Cox-regression (Net-score). We compared its discriminatory power with those of two previously used statistical methods: stepwise variable selection via univariate Cox regression (Uni-score) and Cox regression via Elastic net (Enet-score). The Net scoring system showed better discriminatory power in prediction of disease-specific survival (DSS) than other statistical methods (p=0 in METABRIC training cohort, p=0.000331, 4.58e-06 in two METABRIC validation cohorts) when accuracy was examined by log-rank test. Notably, comparison of C-index and AUC values in receiver operating characteristic analysis at 5 years showed fewer differences between training and validation cohorts with the Net scoring system than other statistical methods, suggesting minimal overfitting. The Net-based scoring system also successfully predicted prognosis in various independent GEO cohorts with high discriminatory power. In conclusion, the Net-based scoring system showed better discriminative power than previous statistical methods in prognostic prediction for breast cancer patients. This new system will mark a new era in prognosis prediction for cancer patients.

  17. Meta4: a web application for sharing and annotating metagenomic gene predictions using web services.

    Science.gov (United States)

    Richardson, Emily J; Escalettes, Franck; Fotheringham, Ian; Wallace, Robert J; Watson, Mick

    2013-01-01

    Whole-genome shotgun metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website, code is available on Github, a cloud image is available, and an example implementation can be seen at.

  18. Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease.

    Science.gov (United States)

    Azuaje, Francisco; Zheng, Huiru; Camargo, Anyela; Wang, Haiying

    2011-08-01

    The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms. Copyright © 2011 Elsevier Inc. All rights reserved.

  19. Characterizing haploinsufficiency of SHELL gene to improve fruit form prediction in introgressive hybrids of oil palm.

    Science.gov (United States)

    Teh, Chee-Keng; Muaz, Siti Dalila; Tangaya, Praveena; Fong, Po-Yee; Ong, Ai-Ling; Mayes, Sean; Chew, Fook-Tim; Kulaveerasingam, Harikrishna; Appleton, David

    2017-06-08

    The fundamental trait in selective breeding of oil palm (Eleais guineensis Jacq.) is the shell thickness surrounding the kernel. The monogenic shell thickness is inversely correlated to mesocarp thickness, where the crude palm oil accumulates. Commercial thin-shelled tenera derived from thick-shelled dura × shell-less pisifera generally contain 30% higher oil per bunch. Two mutations, sh MPOB (M1) and sh AVROS (M2) in the SHELL gene - a type II MADS-box transcription factor mainly present in AVROS and Nigerian origins, were reported to be responsible for different fruit forms. In this study, we have tested 1,339 samples maintained in Sime Darby Plantation using both mutations. Five genotype-phenotype discrepancies and eight controls were then re-tested with all five reported mutations (sh AVROS , sh MPOB , sh MPOB2 , sh MPOB3 and sh MPOB4 ) within the same gene. The integration of genotypic data, pedigree records and shell formation model further explained the haploinsufficiency effect on the SHELL gene with different number of functional copies. Some rare mutations were also identified, suggesting a need to further confirm the existence of cis-compound mutations in the gene. With this, the prediction accuracy of fruit forms can be further improved, especially in introgressive hybrids of oil palm. Understanding causative variant segregation is extremely important, even for monogenic traits such as shell thickness in oil palm.

  20. Predictive minimum description length principle approach to inferring gene regulatory networks.

    Science.gov (United States)

    Chaitankar, Vijender; Zhang, Chaoyang; Ghosh, Preetam; Gong, Ping; Perkins, Edward J; Deng, Youping

    2011-01-01

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold that defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we propose a new inference algorithm that incorporates mutual information (MI), conditional mutual information (CMI), and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm is evaluated using both synthetic time series data sets and a biological time series data set (Saccharomyces cerevisiae). The results show that the proposed algorithm produced fewer false edges and significantly improved the precision when compared to existing MDL algorithm.

  1. Radiation-induced gene expression in human subcutaneous fibroblasts is predictive of radiation-induced fibrosis

    DEFF Research Database (Denmark)

    Rødningen, Olaug Kristin; Børresen-Dale, Anne-Lise; Alsner, Jan

    2008-01-01

    BACKGROUND AND PURPOSE: Breast cancer patients show a large variation in normal tissue reactions after ionizing radiation (IR) therapy. One of the most common long-term adverse effects of ionizing radiotherapy is radiation-induced fibrosis (RIF), and several attempts have been made over the last...... years to develop predictive assays for RIF. Our aim was to identify basal and radiation-induced transcriptional profiles in fibroblasts from breast cancer patients that might be related to the individual risk of RIF in these patients. MATERIALS AND METHODS: Fibroblast cell lines from 31 individuals......-treated fibroblasts. Transcriptional differences in basal and radiation-induced gene expression profiles were investigated using 15K cDNA microarrays, and results analyzed by both SAM and PAM. RESULTS: Sixty differentially expressed genes were identified by applying SAM on 10 patients with the highest risk of RIF...

  2. Angiotensinogen gene polymorphism predicts hypertension, and iridological constitutional classification enhances the risk for hypertension in Koreans.

    Science.gov (United States)

    Cho, Joo-Jang; Hwang, Woo-Jun; Hong, Seung-Heon; Jeong, Hyun-Ja; Lee, Hye-Jung; Kim, Hyung-Min; Um, Jae-Young

    2008-05-01

    This study investigated the relationship between iridological constitution and angiotensinogen (AGN) gene polymorphism in hypertensives. In addition to angiotensin converting enzyme gene, AGN genotype is also one of the most well studied genetic markers of hypertension. Furthermore, iridology, one of complementary and alternative medicine, is the diagnosis of the medical conditions through noting irregularities of the pigmentation in the iris. Iridological constitution has a strong familial aggregation and is implicated in heredity. Therefore, the study classified 87 hypertensive patients with familial history of cerebral infarction and controls (n = 88) according to Iris constitution, and determined AGN genotype. As a result, the AGN/TT genotype was associated with hypertension (chi2 = 13.413, p iridological constitutional classification increased the relative risk for hypertension in the subjects with AGN/T allele. These results suggest that AGN polymorphism predicts hypertension, and iridological constitutional classification enhances the risk for hypertension associated with AGN/T in a Korean population.

  3. A 7 gene expression score predicts for radiation response in cancer cervix

    International Nuclear Information System (INIS)

    Rajkumar, Thangarajan; Vijayalakshmi, Neelakantan; Sabitha, Kesavan; Shirley, Sundersingh; Selvaluxmy, Ganesharaja; Bose, Mayil Vahanan; Nambaru, Lavanya

    2009-01-01

    Cervical cancer is the most common cancer among Indian women. The current recommendations are to treat the stage IIB, IIIA, IIIB and IVA with radical radiotherapy and weekly cisplatin based chemotherapy. However, Radiotherapy alone can help cure more than 60% of stage IIB and up to 40% of stage IIIB patients. Archival RNA samples from 15 patients who had achieved complete remission and stayed disease free for more than 36 months (No Evidence of Disease or NED group) and 10 patients who had failed radical radiotherapy (Failed group) were included in the study. The RNA were amplified, labelled and hybridized to Stanford microarray chips and analyzed using BRB Array Tools software and Significance Analysis of Microarray (SAM) analysis. 20 genes were selected for further validation using Relative Quantitation (RQ) Taqman assay in a Taqman Low-Density Array (TLDA) format. The RQ value was calculated, using each of the NED sample once as a calibrator. A scoring system was developed based on the RQ value for the genes. Using a seven gene based scoring system, it was possible to distinguish between the tumours which were likely to respond to the radiotherapy and those likely to fail. The mean score ± 2 SE (standard error of mean) was used and at a cut-off score of greater than 5.60, the sensitivity, specificity, Positive predictive value (PPV) and Negative predictive value (NPV) were 0.64, 1.0, 1.0, 0.67, respectively, for the low risk group. We have identified a 7 gene signature which could help identify patients with cervical cancer who can be treated with radiotherapy alone. However, this needs to be validated in a larger patient population

  4. Using purine skews to predict genes in AT-rich poxviruses

    Directory of Open Access Journals (Sweden)

    Upton Chris

    2005-02-01

    Full Text Available Abstract Background Clusters or runs of purines on the mRNA synonymous strand have been found in many different organisms including orthopoxviruses. The purine bias that is exhibited by these clusters can be observed using a purine skew and in the case of poxviruses, these skews can be used to help determine the coding strand of a particular segment of the genome. Combined with previous findings that minor ORFs have lower than average aspartate and glutamate composition and higher than average serine composition, purine content can be used to predict the likelihood of a poxvirus ORF being a "real gene". Results Using purine skews and a "quality" measure designed to incorporate previous findings about minor ORFs, we have found that in our training case (vaccinia virus strain Copenhagen, 59 of 65 minor (small and unlikely to be a real genes ORFs were correctly classified as being minor. Of the 201 major (large and likely to be real genes vaccinia ORFs, 192 were correctly classified as being major. Performing a similar analysis with the entomopoxvirus amsacta moorei (AMEV, it was found that 4 major ORFs were incorrectly classified as minor and 9 minor ORFs were incorrectly classified as major. The purine abundance observed for major ORFs in vaccinia virus was found to stem primarily from the first codon position with both the second and third codon positions containing roughly equal amounts of purines and pyrimidines. Conclusion Purine skews and a "quality" measure can be used to predict functional ORFs and purine skews in particular can be used to determine which of two overlapping ORFs is most likely to be the real gene if neither of the two ORFs has orthologs in other poxviruses.

  5. Prediction of lymphatic metastasis based on gene expression profile analysis after brachytherapy for early-stage oral tongue carcinoma

    International Nuclear Information System (INIS)

    Watanabe, Hiroshi; Mogushi, Kaoru; Miura, Masahiko; Yoshimura, Ryo-ichi; Kurabayashi, Tohru; Shibuya, Hitoshi; Tanaka, Hiroshi; Noda, Shuhei; Iwakawa, Mayumi; Imai, Takashi

    2008-01-01

    Background and purpose: The management of lymphatic metastasis of early-stage oral tongue carcinoma patients is crucial for its prognosis. The purpose of this study was to evaluate the predictive ability of lymphatic metastasis after brachytherapy (BRT) for early-stage tongue carcinoma based on gene expression profiling. Patients and methods: Pre-therapeutic biopsies from 39 patients with T1 or T2 tongue cancer were analyzed for gene expression signatures using Codelink Uniset Human 20K Bioarray. All patients were treated with low dose-rate BRT for their primary lesions and underwent strict follow-up under a wait-and-see policy for cervical lymphatic metastasis. Candidate genes were selected for predicting lymph-node status in the reference group by the permutation test. Predictive accuracy was further evaluated by the prediction strength (PS) scoring system using an independent validation group. Results: We selected a set of 19 genes whose expression differed significantly between classes with or without lymphatic metastasis in the reference group. The lymph-node status in the validation group was predicted by the PS scoring system with an accuracy of 76%. Conclusions: Gene expression profiling using 19 genes in primary tumor tissues may allow prediction of lymphatic metastasis after BRT for early-stage oral tongue carcinoma

  6. Expression Pattern Similarities Support the Prediction of Orthologs Retaining Common Functions after Gene Duplication Events1[OPEN

    Science.gov (United States)

    Haberer, Georg; Panda, Arup; Das Laha, Shayani; Ghosh, Tapas Chandra; Schäffner, Anton R.

    2016-01-01

    The identification of functionally equivalent, orthologous genes (functional orthologs) across genomes is necessary for accurate transfer of experimental knowledge from well-characterized organisms to others. This frequently relies on automated, coding sequence-based approaches such as OrthoMCL, Inparanoid, and KOG, which usually work well for one-to-one homologous states. However, this strategy does not reliably work for plants due to the occurrence of extensive gene/genome duplication. Frequently, for one query gene, multiple orthologous genes are predicted in the other genome, and it is not clear a priori from sequence comparison and similarity which one preserves the ancestral function. We have studied 11 organ-dependent and stress-induced gene expression patterns of 286 Arabidopsis lyrata duplicated gene groups and compared them with the respective Arabidopsis (Arabidopsis thaliana) genes to predict putative expressologs and nonexpressologs based on gene expression similarity. Promoter sequence divergence as an additional tool to substantiate functional orthology only partially overlapped with expressolog classification. By cloning eight A. lyrata homologs and complementing them in the respective four Arabidopsis loss-of-function mutants, we experimentally proved that predicted expressologs are indeed functional orthologs, while nonexpressologs or nonfunctionalized orthologs are not. Our study demonstrates that even a small set of gene expression data in addition to sequence homologies are instrumental in the assignment of functional orthologs in the presence of multiple orthologs. PMID:27303025

  7. A statistical method for predicting splice variants between two groups of samples using GeneChip® expression array data

    Directory of Open Access Journals (Sweden)

    Olson James M

    2006-04-01

    Full Text Available Abstract Background Alternative splicing of pre-messenger RNA results in RNA variants with combinations of selected exons. It is one of the essential biological functions and regulatory components in higher eukaryotic cells. Some of these variants are detectable with the Affymetrix GeneChip® that uses multiple oligonucleotide probes (i.e. probe set, since the target sequences for the multiple probes are adjacent within each gene. Hybridization intensity from a probe correlates with abundance of the corresponding transcript. Although the multiple-probe feature in the current GeneChip® was designed to assess expression values of individual genes, it also measures transcriptional abundance for a sub-region of a gene sequence. This additional capacity motivated us to develop a method to predict alternative splicing, taking advance of extensive repositories of GeneChip® gene expression array data. Results We developed a two-step approach to predict alternative splicing from GeneChip® data. First, we clustered the probes from a probe set into pseudo-exons based on similarity of probe intensities and physical adjacency. A pseudo-exon is defined as a sequence in the gene within which multiple probes have comparable probe intensity values. Second, for each pseudo-exon, we assessed the statistical significance of the difference in probe intensity between two groups of samples. Differentially expressed pseudo-exons are predicted to be alternatively spliced. We applied our method to empirical data generated from GeneChip® Hu6800 arrays, which include 7129 probe sets and twenty probes per probe set. The dataset consists of sixty-nine medulloblastoma (27 metastatic and 42 non-metastatic samples and four cerebellum samples as normal controls. We predicted that 577 genes would be alternatively spliced when we compared normal cerebellum samples to medulloblastomas, and predicted that thirteen genes would be alternatively spliced when we compared metastatic

  8. ABC gene-ranking for prediction of drug-induced cholestasis in rats

    Directory of Open Access Journals (Sweden)

    Yauheniya Cherkas

    drugs that behaved very differently, and were distinct from both non-cholestatic and cholestatic drugs (ketoconazole, dipyridamole, cyproheptadine and aniline, and many postulated human cholestatic drugs that in rat showed no evidence of cholestasis (chlorpromazine, erythromycin, niacin, captopril, dapsone, rifampicin, glibenclamide, simvastatin, furosemide, tamoxifen, and sulfamethoxazole. Most of these latter drugs were noted previously by other groups as showing cholestasis only in humans. The results of this work suggest that the ABC procedure and similar statistical approaches can be instrumental in combining data to compare toxicants across toxicogenomics databases, extract similarities among responses and reduce unexplained data varation. Keywords: Cluster analysis, Cholestasis, Gene signature, Microarray, Prediction, Toxicogenomics

  9. Computational Prediction of MicroRNAs from Toxoplasma gondii Potentially Regulating the Hosts’ Gene Expression

    Directory of Open Access Journals (Sweden)

    Müşerref Duygu Saçar

    2014-10-01

    Full Text Available MicroRNAs (miRNAs were discovered two decades ago, yet there is still a great need for further studies elucidating their genesis and targeting in different phyla. Since experimental discovery and validation of miRNAs is difficult, computational predictions are indispensable and today most computational approaches employ machine learning. Toxoplasma gondii, a parasite residing within the cells of its hosts like human, uses miRNAs for its post-transcriptional gene regulation. It may also regulate its hosts’ gene expression, which has been shown in brain cancer. Since previous studies have shown that overexpressed miRNAs within the host are causal for disease onset, we hypothesized that T. gondii could export miRNAs into its host cell. We computationally predicted all hairpins from the genome of T. gondii and used mouse and human models to filter possible candidates. These were then further compared to known miRNAs in human and rodents and their expression was examined for T. gondii grown in mouse and human hosts, respectively. We found that among the millions of potential hairpins in T. gondii, only a few thousand pass filtering using a human or mouse model and that even fewer of those are expressed. Since they are expressed and differentially expressed in rodents and human, we suggest that there is a chance that T. gondii may export miRNAs into its hosts for direct regulation.

  10. Prediction of Associations between microRNAs and Gene Expression in Glioma Biology.

    Directory of Open Access Journals (Sweden)

    Stefan Wuchty

    Full Text Available Despite progress in the determination of miR interactions, their regulatory role in cancer is only beginning to be unraveled. Utilizing gene expression data from 27 glioblastoma samples we found that the mere knowledge of physical interactions between specific mRNAs and miRs can be used to determine associated regulatory interactions, allowing us to identify 626 associated interactions, involving 128 miRs that putatively modulate the expression of 246 mRNAs. Experimentally determining the expression of miRs, we found an over-representation of over(under-expressed miRs with various predicted mRNA target sequences. Such significantly associated miRs that putatively bind over-expressed genes strongly tend to have binding sites nearby the 3'UTR of the corresponding mRNAs, suggesting that the presence of the miRs near the translation stop site may be a factor in their regulatory ability. Our analysis predicted a significant association between miR-128 and the protein kinase WEE1, which we subsequently validated experimentally by showing that the over-expression of the naturally under-expressed miR-128 in glioma cells resulted in the inhibition of WEE1 in glioblastoma cells.

  11. Can survival prediction be improved by merging gene expression data sets?

    Directory of Open Access Journals (Sweden)

    Haleh Yasrebi

    Full Text Available BACKGROUND: High-throughput gene expression profiling technologies generating a wealth of data, are increasingly used for characterization of tumor biopsies for clinical trials. By applying machine learning algorithms to such clinically documented data sets, one hopes to improve tumor diagnosis, prognosis, as well as prediction of treatment response. However, the limited number of patients enrolled in a single trial study limits the power of machine learning approaches due to over-fitting. One could partially overcome this limitation by merging data from different studies. Nevertheless, such data sets differ from each other with regard to technical biases, patient selection criteria and follow-up treatment. It is therefore not clear at all whether the advantage of increased sample size outweighs the disadvantage of higher heterogeneity of merged data sets. Here, we present a systematic study to answer this question specifically for breast cancer data sets. We use survival prediction based on Cox regression as an assay to measure the added value of merged data sets. RESULTS: Using time-dependent Receiver Operating Characteristic-Area Under the Curve (ROC-AUC and hazard ratio as performance measures, we see in overall no significant improvement or deterioration of survival prediction with merged data sets as compared to individual data sets. This apparently was due to the fact that a few genes with strong prognostic power were not available on all microarray platforms and thus were not retained in the merged data sets. Surprisingly, we found that the overall best performance was achieved with a single-gene predictor consisting of CYB5D1. CONCLUSIONS: Merging did not deteriorate performance on average despite (a The diversity of microarray platforms used. (b The heterogeneity of patients cohorts. (c The heterogeneity of breast cancer disease. (d Substantial variation of time to death or relapse. (e The reduced number of genes in the merged data

  12. Chronic and Acute Stress, Gender, and Serotonin Transporter Gene-Environment Interactions Predicting Depression Symptoms in Youth

    Science.gov (United States)

    Hammen, Constance; Brennan, Patricia A.; Keenan-Miller, Danielle; Hazel, Nicholas A.; Najman, Jake M.

    2010-01-01

    Background: Many recent studies of serotonin transporter gene by environment effects predicting depression have used stress assessments with undefined or poor psychometric methods, possibly contributing to wide variation in findings. The present study attempted to distinguish between effects of acute and chronic stress to predict depressive…

  13. Bioinformatic Prediction of Gene Functions Regulated by Quorum Sensing in the Bioleaching Bacterium Acidithiobacillus ferrooxidans

    Directory of Open Access Journals (Sweden)

    Alvaro Banderas

    2013-08-01

    Full Text Available The biomining bacterium Acidithiobacillus ferrooxidans oxidizes sulfide ores and promotes metal solubilization. The efficiency of this process depends on the attachment of cells to surfaces, a process regulated by quorum sensing (QS cell-to-cell signalling in many Gram-negative bacteria. At. ferrooxidans has a functional QS system and the presence of AHLs enhances its attachment to pyrite. However, direct targets of the QS transcription factor AfeR remain unknown. In this study, a bioinformatic approach was used to infer possible AfeR direct targets based on the particular palindromic features of the AfeR binding site. A set of Hidden Markov Models designed to maintain palindromic regions and vary non-palindromic regions was used to screen for putative binding sites. By annotating the context of each predicted binding site (PBS, we classified them according to their positional coherence relative to other putative genomic structures such as start codons, RNA polymerase promoter elements and intergenic regions. We further used the Multiple EM for Motif Elicitation algorithm (MEME to further filter out low homology PBSs. In summary, 75 target-genes were identified, 34 of which have a higher confidence level. Among the identified genes, we found afeR itself, zwf, genes encoding glycosyltransferase activities, metallo-beta lactamases, and active transport-related proteins. Glycosyltransferases and Zwf (Glucose 6-phosphate-1-dehydrogenase might be directly involved in polysaccharide biosynthesis and attachment to minerals by At. ferrooxidans cells during the bioleaching process.

  14. Bioinformatic Prediction of Gene Functions Regulated by Quorum Sensing in the Bioleaching Bacterium Acidithiobacillus ferrooxidans

    Science.gov (United States)

    Banderas, Alvaro; Guiliani, Nicolas

    2013-01-01

    The biomining bacterium Acidithiobacillus ferrooxidans oxidizes sulfide ores and promotes metal solubilization. The efficiency of this process depends on the attachment of cells to surfaces, a process regulated by quorum sensing (QS) cell-to-cell signalling in many Gram-negative bacteria. At. ferrooxidans has a functional QS system and the presence of AHLs enhances its attachment to pyrite. However, direct targets of the QS transcription factor AfeR remain unknown. In this study, a bioinformatic approach was used to infer possible AfeR direct targets based on the particular palindromic features of the AfeR binding site. A set of Hidden Markov Models designed to maintain palindromic regions and vary non-palindromic regions was used to screen for putative binding sites. By annotating the context of each predicted binding site (PBS), we classified them according to their positional coherence relative to other putative genomic structures such as start codons, RNA polymerase promoter elements and intergenic regions. We further used the Multiple EM for Motif Elicitation algorithm (MEME) to further filter out low homology PBSs. In summary, 75 target-genes were identified, 34 of which have a higher confidence level. Among the identified genes, we found afeR itself, zwf, genes encoding glycosyltransferase activities, metallo-beta lactamases, and active transport-related proteins. Glycosyltransferases and Zwf (Glucose 6-phosphate-1-dehydrogenase) might be directly involved in polysaccharide biosynthesis and attachment to minerals by At. ferrooxidans cells during the bioleaching process. PMID:23959118

  15. Calibration of Multiple In Silico Tools for Predicting Pathogenicity of Mismatch Repair Gene Missense Substitutions

    Science.gov (United States)

    Thompson, Bryony A.; Greenblatt, Marc S.; Vallee, Maxime P.; Herkert, Johanna C.; Tessereau, Chloe; Young, Erin L.; Adzhubey, Ivan A.; Li, Biao; Bell, Russell; Feng, Bingjian; Mooney, Sean D.; Radivojac, Predrag; Sunyaev, Shamil R.; Frebourg, Thierry; Hofstra, Robert M.W.; Sijmons, Rolf H.; Boucher, Ken; Thomas, Alun; Goldgar, David E.; Spurdle, Amanda B.; Tavtigian, Sean V.

    2015-01-01

    Classification of rare missense substitutions observed during genetic testing for patient management is a considerable problem in clinical genetics. The Bayesian integrated evaluation of unclassified variants is a solution originally developed for BRCA1/2. Here, we take a step toward an analogous system for the mismatch repair (MMR) genes (MLH1, MSH2, MSH6, and PMS2) that confer colon cancer susceptibility in Lynch syndrome by calibrating in silico tools to estimate prior probabilities of pathogenicity for MMR gene missense substitutions. A qualitative five-class classification system was developed and applied to 143 MMR missense variants. This identified 74 missense substitutions suitable for calibration. These substitutions were scored using six different in silico tools (Align-Grantham Variation Grantham Deviation, multivariate analysis of protein polymorphisms [MAPP], Mut-Pred, PolyPhen-2.1, Sorting Intolerant From Tolerant, and Xvar), using curated MMR multiple sequence alignments where possible. The output from each tool was calibrated by regression against the classifications of the 74 missense substitutions; these calibrated outputs are interpretable as prior probabilities of pathogenicity. MAPP was the most accurate tool and MAPP + PolyPhen-2.1 provided the best-combined model (R2 = 0.62 and area under receiver operating characteristic = 0.93). The MAPP + PolyPhen-2.1 output is sufficiently predictive to feed as a continuous variable into the quantitative Bayesian integrated evaluation for clinical classification of MMR gene missense substitutions. PMID:22949387

  16. In silico prediction of functional loss of cst3 gene in hereditary cerebral amyloid angiopathy

    Directory of Open Access Journals (Sweden)

    Piyush Choudhary

    2013-12-01

    Full Text Available The computational identification of missense mutation in CST3 (CYSTATIN 3 or CYSTATIN C gene has been done in the present study. The missense mutations in the CST3 gene will leads to hereditary cerebral amyloid angiopathy The initiation of the analysis was done with SIFT followed by POLYPHEN-2 and I-Mutant 2.0 using 24 variants of CST3 gene of Homo sapiens which were derived from dbSNP. The analysis showed that 5 variants (Y60C, C123Y, L19P, Y88C, L94Q were found to be less stable and damaging by SIFT, POLYPHEN-2 and I-MUTANT2.0. Furthermore the outputs of SNP & GO are collaborated with PHD-SNP (Predictor of Human Deleterious-Single Nucleotide Polymorphism and PANTHER to predict 5 variants (Y60C, Y88C, C123Y, L19P, and L94Q having clinical impact in causing the disease. These findings will be certainly helpful for the present medical practitioners for the treatment of cerebral amyloid angiopathy.

  17. The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction

    Directory of Open Access Journals (Sweden)

    Garzón-Martínez Gina A

    2012-04-01

    Full Text Available Abstract Background Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. Results We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs, using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI’s BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato and Solanum tuberosum (potato. We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. Conclusions We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the

  18. The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction.

    Science.gov (United States)

    Garzón-Martínez, Gina A; Zhu, Z Iris; Landsman, David; Barrero, Luz S; Mariño-Ramírez, Leonardo

    2012-04-25

    Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs), using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI's BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato) and Solanum tuberosum (potato). We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the divergence of five other Solanaceae family members, S

  19. Whole genome transcript profiling of drug induced steatosis in rats reveals a gene signature predictive of outcome.

    Directory of Open Access Journals (Sweden)

    Nishika Sahini

    Full Text Available Drug induced steatosis (DIS is characterised by excess triglyceride accumulation in the form of lipid droplets (LD in liver cells. To explore mechanisms underlying DIS we interrogated the publically available microarray data from the Japanese Toxicogenomics Project (TGP to study comprehensively whole genome gene expression changes in the liver of treated rats. For this purpose a total of 17 and 12 drugs which are diverse in molecular structure and mode of action were considered based on their ability to cause either steatosis or phospholipidosis, respectively, while 7 drugs served as negative controls. In our efforts we focused on 200 genes which are considered to be mechanistically relevant in the process of lipid droplet biogenesis in hepatocytes as recently published (Sahini and Borlak, 2014. Based on mechanistic considerations we identified 19 genes which displayed dose dependent responses while 10 genes showed time dependency. Importantly, the present study defined 9 genes (ANGPTL4, FABP7, FADS1, FGF21, GOT1, LDLR, GK, STAT3, and PKLR as signature genes to predict DIS. Moreover, cross tabulation revealed 9 genes to be regulated ≥10 times amongst the various conditions and included genes linked to glucose metabolism, lipid transport and lipogenesis as well as signalling events. Additionally, a comparison between drugs causing phospholipidosis and/or steatosis revealed 26 genes to be regulated in common including 4 signature genes to predict DIS (PKLR, GK, FABP7 and FADS1. Furthermore, a comparison between in vivo single dose (3, 6, 9 and 24 h and findings from rat hepatocyte studies (2 h, 8 h, 24 h identified 10 genes which are regulated in common and contained 2 DIS signature genes (FABP7, FGF21. Altogether, our studies provide comprehensive information on mechanistically linked gene expression changes of a range of drugs causing steatosis and phospholipidosis and encourage the screening of DIS signature genes at the preclinical stage.

  20. Landscape genetics as a tool for conservation planning: predicting the effects of landscape change on gene flow.

    Science.gov (United States)

    van Strien, Maarten J; Keller, Daniela; Holderegger, Rolf; Ghazoul, Jaboury; Kienast, Felix; Bolliger, Janine

    2014-03-01

    For conservation managers, it is important to know whether landscape changes lead to increasing or decreasing gene flow. Although the discipline of landscape genetics assesses the influence of landscape elements on gene flow, no studies have yet used landscape-genetic models to predict gene flow resulting from landscape change. A species that has already been severely affected by landscape change is the large marsh grasshopper (Stethophyma grossum), which inhabits moist areas in fragmented agricultural landscapes in Switzerland. From transects drawn between all population pairs within maximum dispersal distance (landscape composition as well as some measures of habitat configuration. Additionally, a complete sampling of all populations in our study area allowed incorporating measures of population topology. These measures together with the landscape metrics formed the predictor variables in linear models with gene flow as response variable (F(ST) and mean pairwise assignment probability). With a modified leave-one-out cross-validation approach, we selected the model with the highest predictive accuracy. With this model, we predicted gene flow under several landscape-change scenarios, which simulated construction, rezoning or restoration projects, and the establishment of a new population. For some landscape-change scenarios, significant increase or decrease in gene flow was predicted, while for others little change was forecast. Furthermore, we found that the measures of population topology strongly increase model fit in landscape genetic analysis. This study demonstrates the use of predictive landscape-genetic models in conservation and landscape planning.

  1. Prediction of metabolic flux distribution from gene expression data based on the flux minimization principle.

    Directory of Open Access Journals (Sweden)

    Hyun-Seob Song

    Full Text Available Prediction of possible flux distributions in a metabolic network provides detailed phenotypic information that links metabolism to cellular physiology. To estimate metabolic steady-state fluxes, the most common approach is to solve a set of macroscopic mass balance equations subjected to stoichiometric constraints while attempting to optimize an assumed optimal objective function. This assumption is justifiable in specific cases but may be invalid when tested across different conditions, cell populations, or other organisms. With an aim to providing a more consistent and reliable prediction of flux distributions over a wide range of conditions, in this article we propose a framework that uses the flux minimization principle to predict active metabolic pathways from mRNA expression data. The proposed algorithm minimizes a weighted sum of flux magnitudes, while biomass production can be bounded to fit an ample range from very low to very high values according to the analyzed context. We have formulated the flux weights as a function of the corresponding enzyme reaction's gene expression value, enabling the creation of context-specific fluxes based on a generic metabolic network. In case studies of wild-type Saccharomyces cerevisiae, and wild-type and mutant Escherichia coli strains, our method achieved high prediction accuracy, as gauged by correlation coefficients and sums of squared error, with respect to the experimentally measured values. In contrast to other approaches, our method was able to provide quantitative predictions for both model organisms under a variety of conditions. Our approach requires no prior knowledge or assumption of a context-specific metabolic functionality and does not require trial-and-error parameter adjustments. Thus, our framework is of general applicability for modeling the transcription-dependent metabolism of bacteria and yeasts.

  2. Mining predicted essential genes of Brugia malayi for nematode drug targets.

    Directory of Open Access Journals (Sweden)

    Sanjay Kumar

    Full Text Available We report results from the first genome-wide application of a rational drug target selection methodology to a metazoan pathogen genome, the completed draft sequence of Brugia malayi, a parasitic nematode responsible for human lymphatic filariasis. More than 1.5 billion people worldwide are at risk of contracting lymphatic filariasis and onchocerciasis, a related filarial disease. Drug treatments for filariasis have not changed significantly in over 20 years, and with the risk of resistance rising, there is an urgent need for the development of new anti-filarial drug therapies. The recent publication of the draft genomic sequence for B. malayi enables a genome-wide search for new drug targets. However, there is no functional genomics data in B. malayi to guide the selection of potential drug targets. To circumvent this problem, we have utilized the free-living model nematode Caenorhabditis elegans as a surrogate for B. malayi. Sequence comparisons between the two genomes allow us to map C. elegans orthologs to B. malayi genes. Using these orthology mappings and by incorporating the extensive genomic and functional genomic data, including genome-wide RNAi screens, that already exist for C. elegans, we identify potentially essential genes in B. malayi. Further incorporation of human host genome sequence data and a custom algorithm for prioritization enables us to collect and rank nearly 600 drug target candidates. Previously identified potential drug targets cluster near the top of our prioritized list, lending credibility to our methodology. Over-represented Gene Ontology terms, predicted InterPro domains, and RNAi phenotypes of C. elegans orthologs associated with the potential target pool are identified. By virtue of the selection procedure, the potential B. malayi drug targets highlight components of key processes in nematode biology such as central metabolism, molting and regulation of gene expression.

  3. Integrating circadian activity and gene expression profiles to predict chronotoxicity of Drosophila suzukii response to insecticides.

    Science.gov (United States)

    Hamby, Kelly A; Kwok, Rosanna S; Zalom, Frank G; Chiu, Joanna C

    2013-01-01

    Native to Southeast Asia, Drosophila suzukii (Matsumura) is a recent invader that infests intact ripe and ripening fruit, leading to significant crop losses in the U.S., Canada, and Europe. Since current D. suzukii management strategies rely heavily on insecticide usage and insecticide detoxification gene expression is under circadian regulation in the closely related Drosophila melanogaster, we set out to determine if integrative analysis of daily activity patterns and detoxification gene expression can predict chronotoxicity of D. suzukii to insecticides. Locomotor assays were performed under conditions that approximate a typical summer or winter day in Watsonville, California, where D. suzukii was first detected in North America. As expected, daily activity patterns of D. suzukii appeared quite different between 'summer' and 'winter' conditions due to differences in photoperiod and temperature. In the 'summer', D. suzukii assumed a more bimodal activity pattern, with maximum activity occurring at dawn and dusk. In the 'winter', activity was unimodal and restricted to the warmest part of the circadian cycle. Expression analysis of six detoxification genes and acute contact bioassays were performed at multiple circadian times, but only in conditions approximating Watsonville summer, the cropping season, when most insecticide applications occur. Five of the genes tested exhibited rhythmic expression, with the majority showing peak expression at dawn (ZT0, 6am). We observed significant differences in the chronotoxicity of D. suzukii towards malathion, with highest susceptibility at ZT0 (6am), corresponding to peak expression of cytochrome P450s that may be involved in bioactivation of malathion. High activity levels were not found to correlate with high insecticide susceptibility as initially hypothesized. Chronobiology and chronotoxicity of D. suzukii provide valuable insights for monitoring and control efforts, because insect activity as well as insecticide timing

  4. A novel gene network inference algorithm using predictive minimum description length approach.

    Science.gov (United States)

    Chaitankar, Vijender; Ghosh, Preetam; Perkins, Edward J; Gong, Ping; Deng, Youping; Zhang, Chaoyang

    2010-05-28

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold which defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we proposed a new inference algorithm which incorporated mutual information (MI), conditional mutual information (CMI) and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm was evaluated using both synthetic time series data sets and a biological time series data set for the yeast Saccharomyces cerevisiae. The benchmark quantities precision and recall were used as performance measures. The results show that the proposed algorithm produced less false edges and significantly improved the precision, as compared to the existing algorithm. For further analysis the performance of the algorithms was observed over different sizes of data. We have proposed a new algorithm that implements the PMDL principle for inferring gene regulatory networks from time series DNA microarray data that eliminates the need of a fine tuning parameter. The evaluation results obtained from both synthetic and actual biological data sets show that the

  5. Predicting spatial and temporal gene expression using an integrative model of transcription factor occupancy and chromatin state.

    Directory of Open Access Journals (Sweden)

    Bartek Wilczynski

    Full Text Available Precise patterns of spatial and temporal gene expression are central to metazoan complexity and act as a driving force for embryonic development. While there has been substantial progress in dissecting and predicting cis-regulatory activity, our understanding of how information from multiple enhancer elements converge to regulate a gene's expression remains elusive. This is in large part due to the number of different biological processes involved in mediating regulation as well as limited availability of experimental measurements for many of them. Here, we used a Bayesian approach to model diverse experimental regulatory data, leading to accurate predictions of both spatial and temporal aspects of gene expression. We integrated whole-embryo information on transcription factor recruitment to multiple cis-regulatory modules, insulator binding and histone modification status in the vicinity of individual gene loci, at a genome-wide scale during Drosophila development. The model uses Bayesian networks to represent the relation between transcription factor occupancy and enhancer activity in specific tissues and stages. All parameters are optimized in an Expectation Maximization procedure providing a model capable of predicting tissue- and stage-specific activity of new, previously unassayed genes. Performing the optimization with subsets of input data demonstrated that neither enhancer occupancy nor chromatin state alone can explain all gene expression patterns, but taken together allow for accurate predictions of spatio-temporal activity. Model predictions were validated using the expression patterns of more than 600 genes recently made available by the BDGP consortium, demonstrating an average 15-fold enrichment of genes expressed in the predicted tissue over a naïve model. We further validated the model by experimentally testing the expression of 20 predicted target genes of unknown expression, resulting in an accuracy of 95% for temporal

  6. Google goes cancer: improving outcome prediction for cancer patients by network-based ranking of marker genes.

    Directory of Open Access Journals (Sweden)

    Christof Winter

    Full Text Available Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. However, state of the art methods used so far often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google's PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods, such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. As the amount of genomic data of individual tumors grows rapidly, our algorithm meets the need for powerful computational approaches that are key to exploit these data for personalized cancer therapies in clinical practice.

  7. Gene expression patterns in formalin-fixed, paraffin-embedded core biopsies predict docetaxel chemosensitivity in breast cancer patients.

    Science.gov (United States)

    Chang, Jenny C; Makris, Andreas; Gutierrez, M Carolina; Hilsenbeck, Susan G; Hackett, James R; Jeong, Jennie; Liu, Mei-Lan; Baker, Joffre; Clark-Langone, Kim; Baehner, Frederick L; Sexton, Krsytal; Mohsin, Syed; Gray, Tara; Alvarez, Laura; Chamness, Gary C; Osborne, C Kent; Shak, Steven

    2008-03-01

    Previously, we had identified gene expression patterns that predicted response to neoadjuvant docetaxel. Other studies have validated that a high Recurrence Score (RS) by the 21-gene RT-PCR assay is predictive of worse prognosis but better response to chemotherapy. We investigated whether tumor expression of these 21 genes and other candidate genes can predict response to docetaxel. Core biopsies from 97 patients were obtained before treatment with neoadjuvant docetaxel (4 cycles, 100 mg/m2 q3 weeks). Three 10-microm FFPE sections were submitted for quantitative RT-PCR assays of 192 genes that were selected from our previous work and the literature. Of the 97 patients, 81 (84%) had sufficient invasive cancer, 80 (82%) had sufficient RNA for QRTPCR assay, and 72 (74%) had clinical response data. Mean age was 48.5 years, and the median tumor size was 6 cm. Clinical complete responses (CR) were observed in 12 (17%), partial responses in 41 (57%), stable disease in 17 (24%), and progressive disease in 2 patients (3%). A significant relationship (P<0.05) between gene expression and CR was observed for 14 genes, including CYBA. CR was associated with lower expression of the ER gene group and higher expression of the proliferation gene group from the 21 gene assay. Of note, CR was more likely with a high RS (P=0.008). We have established molecular profiles of sensitivity to docetaxel. RT-PCR technology provides a potential platform for a predictive test of docetaxel chemosensitivity using small amounts of routinely processed material.

  8. Prediction of essential proteins based on subcellular localization and gene expression correlation.

    Science.gov (United States)

    Fan, Yetian; Tang, Xiwei; Hu, Xiaohua; Wu, Wei; Ping, Qing

    2017-12-01

    Essential proteins are indispensable to the survival and development process of living organisms. To understand the functional mechanisms of essential proteins, which can be applied to the analysis of disease and design of drugs, it is important to identify essential proteins from a set of proteins first. As traditional experimental methods designed to test out essential proteins are usually expensive and laborious, computational methods, which utilize biological and topological features of proteins, have attracted more attention in recent years. Protein-protein interaction networks, together with other biological data, have been explored to improve the performance of essential protein prediction. The proposed method SCP is evaluated on Saccharomyces cerevisiae datasets and compared with five other methods. The results show that our method SCP outperforms the other five methods in terms of accuracy of essential protein prediction. In this paper, we propose a novel algorithm named SCP, which combines the ranking by a modified PageRank algorithm based on subcellular compartments information, with the ranking by Pearson correlation coefficient (PCC) calculated from gene expression data. Experiments show that subcellular localization information is promising in boosting essential protein prediction.

  9. Gene expression signatures that predict radiation exposure in mice and humans.

    Directory of Open Access Journals (Sweden)

    Holly K Dressman

    2007-04-01

    Full Text Available The capacity to assess environmental inputs to biological phenotypes is limited by methods that can accurately and quantitatively measure these contributions. One such example can be seen in the context of exposure to ionizing radiation.We have made use of gene expression analysis of peripheral blood (PB mononuclear cells to develop expression profiles that accurately reflect prior radiation exposure. We demonstrate that expression profiles can be developed that not only predict radiation exposure in mice but also distinguish the level of radiation exposure, ranging from 50 cGy to 1,000 cGy. Likewise, a molecular signature of radiation response developed solely from irradiated human patient samples can predict and distinguish irradiated human PB samples from nonirradiated samples with an accuracy of 90%, sensitivity of 85%, and specificity of 94%. We further demonstrate that a radiation profile developed in the mouse can correctly distinguish PB samples from irradiated and nonirradiated human patients with an accuracy of 77%, sensitivity of 82%, and specificity of 75%. Taken together, these data demonstrate that molecular profiles can be generated that are highly predictive of different levels of radiation exposure in mice and humans.We suggest that this approach, with additional refinement, could provide a method to assess the effects of various environmental inputs into biological phenotypes as well as providing a more practical application of a rapid molecular screening test for the diagnosis of radiation exposure.

  10. Response-predictive gene expression profiling of glioma progenitor cells in vitro.

    Directory of Open Access Journals (Sweden)

    Sylvia Moeckel

    Full Text Available High-grade gliomas are amongst the most deadly human tumors. Treatment results are disappointing. Still, in several trials around 20% of patients respond to therapy. To date, diagnostic strategies to identify patients that will profit from a specific therapy do not exist.In this study, we used serum-free short-term treated in vitro cell cultures to predict treatment response in vitro. This approach allowed us (a to enrich specimens for brain tumor initiating cells and (b to confront cells with a therapeutic agent before expression profiling.As a proof of principle we analyzed gene expression in 18 short-term serum-free cultures of high-grade gliomas enhanced for brain tumor initiating cells (BTIC before and after in vitro treatment with the tyrosine kinase inhibitor Sunitinib. Profiles from treated progenitor cells allowed to predict therapy-induced impairment of proliferation in vitro.For the tyrosine kinase inhibitor Sunitinib used in this dataset, the approach revealed additional predictive information in comparison to the evaluation of classical signaling analysis.

  11. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.

    Science.gov (United States)

    Arango-Argoty, Gustavo; Garner, Emily; Pruden, Amy; Heath, Lenwood S; Vikesland, Peter; Zhang, Liqing

    2018-02-01

    Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the "best hits" of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models' performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories. The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The

  12. Gene expression programming for prediction of scour depth downstream of sills

    Science.gov (United States)

    Azamathulla, H. Md.

    2012-08-01

    SummaryLocal scour is crucial in the degradation of river bed and the stability of grade control structures, stilling basins, aprons, ski-jump bucket spillways, bed sills, weirs, check dams, etc. This short communication presents gene-expression programming (GEP), which is an extension to genetic programming (GP), as an alternative approach to predict scour depth downstream of sills. Published data were compiled from the literature for the scour depth downstream of sills. The proposed GEP approach gives satisfactory results (R2 = 0.967 and RMSE = 0.088) compared to the existing predictors (Chinnarasri and Kositgittiwong, 2008) with R2 = 0.87 and RMSE = 2.452 for relative scour depth.

  13. Boolean Dynamic Modeling Approaches to Study Plant Gene Regulatory Networks: Integration, Validation, and Prediction.

    Science.gov (United States)

    Velderraín, José Dávila; Martínez-García, Juan Carlos; Álvarez-Buylla, Elena R

    2017-01-01

    Mathematical models based on dynamical systems theory are well-suited tools for the integration of available molecular experimental data into coherent frameworks in order to propose hypotheses about the cooperative regulatory mechanisms driving developmental processes. Computational analysis of the proposed models using well-established methods enables testing the hypotheses by contrasting predictions with observations. Within such framework, Boolean gene regulatory network dynamical models have been extensively used in modeling plant development. Boolean models are simple and intuitively appealing, ideal tools for collaborative efforts between theorists and experimentalists. In this chapter we present protocols used in our group for the study of diverse plant developmental processes. We focus on conceptual clarity and practical implementation, providing directions to the corresponding technical literature.

  14. A Machine Learned Classifier That Uses Gene Expression Data to Accurately Predict Estrogen Receptor Status

    Science.gov (United States)

    Bastani, Meysam; Vos, Larissa; Asgarian, Nasimeh; Deschenes, Jean; Graham, Kathryn; Mackey, John; Greiner, Russell

    2013-01-01

    Background Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. Methods To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. Results This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. Conclusions Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions. PMID:24312637

  15. A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.

    Directory of Open Access Journals (Sweden)

    Meysam Bastani

    Full Text Available BACKGROUND: Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. METHODS: To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. RESULTS: This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. CONCLUSIONS: Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions.

  16. Dopamine Gene Profiling to Predict Impulse Control and Effects of Dopamine Agonist Ropinirole.

    Science.gov (United States)

    MacDonald, Hayley J; Stinear, Cathy M; Ren, April; Coxon, James P; Kao, Justin; Macdonald, Lorraine; Snow, Barry; Cramer, Steven C; Byblow, Winston D

    2016-07-01

    Dopamine agonists can impair inhibitory control and cause impulse control disorders for those with Parkinson disease (PD), although mechanistically this is not well understood. In this study, we hypothesized that the extent of such drug effects on impulse control is related to specific dopamine gene polymorphisms. This double-blind, placebo-controlled study aimed to examine the effect of single doses of 0.5 and 1.0 mg of the dopamine agonist ropinirole on impulse control in healthy adults of typical age for PD onset. Impulse control was measured by stop signal RT on a response inhibition task and by an index of impulsive decision-making on the Balloon Analogue Risk Task. A dopamine genetic risk score quantified basal dopamine neurotransmission from the influence of five genes: catechol-O-methyltransferase, dopamine transporter, and those encoding receptors D1, D2, and D3. With placebo, impulse control was better for the high versus low genetic risk score groups. Ropinirole modulated impulse control in a manner dependent on genetic risk score. For the lower score group, both doses improved response inhibition (decreased stop signal RT) whereas the lower dose reduced impulsiveness in decision-making. Conversely, the higher score group showed a trend for worsened response inhibition on the lower dose whereas both doses increased impulsiveness in decision-making. The implications of the present findings are that genotyping can be used to predict impulse control and whether it will improve or worsen with the administration of dopamine agonists.

  17. Multiple genetic interaction experiments provide complementary information useful for gene function prediction.

    Directory of Open Access Journals (Sweden)

    Magali Michaut

    Full Text Available Genetic interactions help map biological processes and their functional relationships. A genetic interaction is defined as a deviation from the expected phenotype when combining multiple genetic mutations. In Saccharomyces cerevisiae, most genetic interactions are measured under a single phenotype - growth rate in standard laboratory conditions. Recently genetic interactions have been collected under different phenotypic readouts and experimental conditions. How different are these networks and what can we learn from their differences? We conducted a systematic analysis of quantitative genetic interaction networks in yeast performed under different experimental conditions. We find that networks obtained using different phenotypic readouts, in different conditions and from different laboratories overlap less than expected and provide significant unique information. To exploit this information, we develop a novel method to combine individual genetic interaction data sets and show that the resulting network improves gene function prediction performance, demonstrating that individual networks provide complementary information. Our results support the notion that using diverse phenotypic readouts and experimental conditions will substantially increase the amount of gene function information produced by genetic interaction screens.

  18. AUC-based biomarker ensemble with an application on gene scores predicting low bone mineral density.

    Science.gov (United States)

    Zhao, X G; Dai, W; Li, Y; Tian, L

    2011-11-01

    The area under the receiver operating characteristic (ROC) curve (AUC), long regarded as a 'golden' measure for the predictiveness of a continuous score, has propelled the need to develop AUC-based predictors. However, the AUC-based ensemble methods are rather scant, largely due to the fact that the associated objective function is neither continuous nor concave. Indeed, there is no reliable numerical algorithm identifying optimal combination of a set of biomarkers to maximize the AUC, especially when the number of biomarkers is large. We have proposed a novel AUC-based statistical ensemble methods for combining multiple biomarkers to differentiate a binary response of interest. Specifically, we propose to replace the non-continuous and non-convex AUC objective function by a convex surrogate loss function, whose minimizer can be efficiently identified. With the established framework, the lasso and other regularization techniques enable feature selections. Extensive simulations have demonstrated the superiority of the new methods to the existing methods. The proposal has been applied to a gene expression dataset to construct gene expression scores to differentiate elderly women with low bone mineral density (BMD) and those with normal BMD. The AUCs of the resulting scores in the independent test dataset has been satisfactory. Aiming for directly maximizing AUC, the proposed AUC-based ensemble method provides an efficient means of generating a stable combination of multiple biomarkers, which is especially useful under the high-dimensional settings. lutian@stanford.edu. Supplementary data are available at Bioinformatics online.

  19. Interactions of adolescent social experiences and dopamine genes to predict physical intimate partner violence perpetration.

    Directory of Open Access Journals (Sweden)

    Laura M Schwab-Reese

    Full Text Available We examined the interactions between three dopamine gene alleles (DAT1, DRD2, DRD4 previously associated with violent behavior and two components of the adolescent environment (exposure to violence, school social environment to predict adulthood physical intimate partner violence (IPV perpetration among white men and women.We used data from Wave IV of the National Longitudinal Study of Adolescent to Adult Health, a cohort study following individuals from adolescence to adulthood. Based on the prior literature, we categorized participants as at risk for each of the three dopamine genes using this coding scheme: two 10-R alleles for DAT1; at least one A-1 allele for DRD2; at least one 7-R or 8-R allele for DRD4. Adolescent exposure to violence and school social environment was measured in 1994 and 1995 when participants were in high school or middle school. Intimate partner violence perpetration was measured in 2008 when participants were 24 to 32 years old. We used simple and multivariable logistic regression models, including interactions of genes and the adolescent environments for the analysis.Presence of risk alleles was not independently associated with IPV perpetration but increasing exposure to violence and disconnection from the school social environment was associated with physical IPV perpetration. The effects of these adolescent experiences on physical IPV perpetration varied by dopamine risk allele status. Among individuals with non-risk dopamine alleles, increased exposure to violence during adolescence and perception of disconnection from the school environment were significantly associated with increased odds of physical IPV perpetration, but individuals with high risk alleles, overall, did not experience the same increase.Our results suggested the effects of adolescent environment on adulthood physical IPV perpetration varied by genetic factors. This analysis did not find a direct link between risk alleles and violence, but

  20. Melanopsin gene variations interact with season to predict sleep onset and chronotype.

    Science.gov (United States)

    Roecklein, Kathryn A; Wong, Patricia M; Franzen, Peter L; Hasler, Brant P; Wood-Vasey, W Michael; Nimgaonkar, Vishwajit L; Miller, Megan A; Kepreos, Kyle M; Ferrell, Robert E; Manuck, Stephen B

    2012-10-01

    The human melanopsin gene has been reported to mediate risk for seasonal affective disorder (SAD), which is hypothesized to be caused by decreased photic input during winter when light levels fall below threshold, resulting in differences in circadian phase and/or sleep. However, it is unclear if melanopsin increases risk of SAD by causing differences in sleep or circadian phase, or if those differences are symptoms of the mood disorder. To determine if melanopsin sequence variations are associated with differences in sleep-wake behavior among those not suffering from a mood disorder, the authors tested associations between melanopsin gene polymorphisms and self-reported sleep timing (sleep onset and wake time) in a community sample (N = 234) of non-Hispanic Caucasian participants (age 30-54 yrs) with no history of psychological, neurological, or sleep disorders. The authors also tested the effect of melanopsin variations on differences in preferred sleep and activity timing (i.e., chronotype), which may reflect differences in circadian phase, sleep homeostasis, or both. Daylength on the day of assessment was measured and included in analyses. DNA samples were genotyped for melanopsin gene polymorphisms using fluorescence polarization. P10L genotype interacted with daylength to predict self-reported sleep onset (interaction p sleep onset among those with the TT genotype was later in the day when individuals were assessed on longer days and earlier in the day on shorter days, whereas individuals in the other genotype groups (i.e., CC and CT) did not show this interaction effect. P10L genotype also interacted in an analogous way with daylength to predict self-reported morningness (interaction p sleep onset and chronotype as a function of daylength, whereas other genotypes at P10L do not seem to have effects that vary by daylength. A better understanding of how melanopsin confers heightened responsivity to daylength may improve our understanding of a broad range of

  1. Gene expression markers in circulating tumor cells may predict bone metastasis and response to hormonal treatment in breast cancer.

    Science.gov (United States)

    Wang, Haiying; Molina, Julian; Jiang, John; Ferber, Matthew; Pruthi, Sandhya; Jatkoe, Timothy; Derecho, Carlo; Rajpurohit, Yashoda; Zheng, Jian; Wang, Yixin

    2013-11-01

    Circulating tumor cells (CTCs) have recently attracted attention due to their potential as prognostic and predictive markers for the clinical management of metastatic breast cancer patients. The isolation of CTCs from patients may enable the molecular characterization of these cells, which may help establish a minimally invasive assay for the prediction of metastasis and further optimization of treatment. Molecular markers of proven clinical value may therefore be useful in predicting disease aggressiveness and response to treatment. In our earlier study, we identified a gene signature in breast cancer that appears to be significantly associated with bone metastasis. Among the genes that constitute this signature, trefoil factor 1 (TFF1) was identified as the most differentially expressed gene associated with bone metastasis. In this study, we investigated 25 candidate gene markers in the CTCs of metastatic breast cancer patients with different metastatic sites. The panel of the 25 markers was investigated in 80 baseline samples (first blood draw of CTCs) and 30 follow-up samples. In addition, 40 healthy blood donors (HBDs) were analyzed as controls. The assay was performed using quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) with RNA extracted from CTCs captured by the CellSearch system. Our study indicated that 12 of the genes were uniquely expressed in CTCs and 10 were highly expressed in the CTCs obtained from patients compared to those obtained from HBDs. Among these genes, the expression of keratin 19 was highly correlated with the CTC count. The TFF1 expression in CTCs was a strong predictor of bone metastasis and the patients with a high expression of estrogen receptor β in CTCs exhibited a better response to hormonal treatment. Molecular characterization of these genes in CTCs may provide a better understanding of the mechanism underlying tumor metastasis and identify gene markers in CTCs for predicting disease progression and

  2. Peripheral neuropathy predicts nuclear gene defect in patients with mitochondrial ophthalmoplegia.

    Science.gov (United States)

    Horga, Alejandro; Pitceathly, Robert D S; Blake, Julian C; Woodward, Catherine E; Zapater, Pedro; Fratter, Carl; Mudanohwo, Ese E; Plant, Gordon T; Houlden, Henry; Sweeney, Mary G; Hanna, Michael G; Reilly, Mary M

    2014-12-01

    Progressive external ophthalmoplegia is a common clinical feature in mitochondrial disease caused by nuclear DNA defects and single, large-scale mitochondrial DNA deletions and is less frequently associated with point mutations of mitochondrial DNA. Peripheral neuropathy is also a frequent manifestation of mitochondrial disease, although its prevalence and characteristics varies considerably among the different syndromes and genetic aetiologies. Based on clinical observations, we systematically investigated whether the presence of peripheral neuropathy could predict the underlying genetic defect in patients with progressive external ophthalmoplegia. We analysed detailed demographic, clinical and neurophysiological data from 116 patients with genetically-defined mitochondrial disease and progressive external ophthalmoplegia. Seventy-eight patients (67%) had a single mitochondrial DNA deletion, 12 (10%) had a point mutation of mitochondrial DNA and 26 (22%) had mutations in either POLG, C10orf2 or RRM2B, or had multiple mitochondrial DNA deletions in muscle without an identified nuclear gene defect. Seventy-seven patients had neurophysiological studies; of these, 16 patients (21%) had a large-fibre peripheral neuropathy. The prevalence of peripheral neuropathy was significantly lower in patients with a single mitochondrial DNA deletion (2%) as compared to those with a point mutation of mitochondrial DNA or with a nuclear DNA defect (44% and 52%, respectively; Pperipheral neuropathy as the only independent predictor associated with a nuclear DNA defect (P=0.002; odds ratio 8.43, 95% confidence interval 2.24-31.76). Multinomial logistic regression analysis identified peripheral neuropathy, family history and hearing loss as significant predictors of the genotype, and the same three variables showed the highest performance in genotype classification in a decision tree analysis. Of these variables, peripheral neuropathy had the highest specificity (91%), negative

  3. High-Throughput Gene Expression Profiles to Define Drug Similarity and Predict Compound Activity.

    Science.gov (United States)

    De Wolf, Hans; Cougnaud, Laure; Van Hoorde, Kirsten; De Bondt, An; Wegner, Joerg K; Ceulemans, Hugo; Göhlmann, Hinrich

    2018-04-01

    By adding biological information, beyond the chemical properties and desired effect of a compound, uncharted compound areas and connections can be explored. In this study, we add transcriptional information for 31K compounds of Janssen's primary screening deck, using the HT L1000 platform and assess (a) the transcriptional connection score for generating compound similarities, (b) machine learning algorithms for generating target activity predictions, and (c) the scaffold hopping potential of the resulting hits. We demonstrate that the transcriptional connection score is best computed from the significant genes only and should be interpreted within its confidence interval for which we provide the stats. These guidelines help to reduce noise, increase reproducibility, and enable the separation of specific and promiscuous compounds. The added value of machine learning is demonstrated for the NR3C1 and HSP90 targets. Support Vector Machine models yielded balanced accuracy values ≥80% when the expression values from DDIT4 & SERPINE1 and TMEM97 & SPR were used to predict the NR3C1 and HSP90 activity, respectively. Combining both models resulted in 22 new and confirmed HSP90-independent NR3C1 inhibitors, providing two scaffolds (i.e., pyrimidine and pyrazolo-pyrimidine), which could potentially be of interest in the treatment of depression (i.e., inhibiting the glucocorticoid receptor (i.e., NR3C1), while leaving its chaperone, HSP90, unaffected). As such, the initial hit rate increased by a factor 300, as less, but more specific chemistry could be screened, based on the upfront computed activity predictions.

  4. Group spike-and-slab lasso generalized linear models for disease prediction and associated genes detection by incorporating pathway information.

    Science.gov (United States)

    Tang, Zaixiang; Shen, Yueping; Li, Yan; Zhang, Xinyan; Wen, Jia; Qian, Chen'ao; Zhuang, Wenzhuo; Shi, Xinghua; Yi, Nengjun

    2018-03-15

    Large-scale molecular data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, standard approaches for omics data analysis ignore the group structure among genes encoded in functional relationships or pathway information. We propose new Bayesian hierarchical generalized linear models, called group spike-and-slab lasso GLMs, for predicting disease outcomes and detecting associated genes by incorporating large-scale molecular data and group structures. The proposed model employs a mixture double-exponential prior for coefficients that induces self-adaptive shrinkage amount on different coefficients. The group information is incorporated into the model by setting group-specific parameters. We have developed a fast and stable deterministic algorithm to fit the proposed hierarchal GLMs, which can perform variable selection within groups. We assess the performance of the proposed method on several simulated scenarios, by varying the overlap among groups, group size, number of non-null groups, and the correlation within group. Compared with existing methods, the proposed method provides not only more accurate estimates of the parameters but also better prediction. We further demonstrate the application of the proposed procedure on three cancer datasets by utilizing pathway structures of genes. Our results show that the proposed method generates powerful models for predicting disease outcomes and detecting associated genes. The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). nyi@uab.edu. Supplementary data are available at Bioinformatics online.

  5. Prediction of target genes for miR-140-5p in pulmonary arterial hypertension using bioinformatics methods.

    Science.gov (United States)

    Li, Fangwei; Shi, Wenhua; Wan, Yixin; Wang, Qingting; Feng, Wei; Yan, Xin; Wang, Jian; Chai, Limin; Zhang, Qianqian; Li, Manxiang

    2017-12-01

    The expression of microRNA (miR)-140-5p is known to be reduced in both pulmonary arterial hypertension (PAH) patients and monocrotaline-induced PAH models in rat. Identification of target genes for miR-140-5p with bioinformatics analysis may reveal new pathways and connections in PAH. This study aimed to explore downstream target genes and relevant signaling pathways regulated by miR-140-5p to provide theoretical evidences for further researches on role of miR-140-5p in PAH. Multiple downstream target genes and upstream transcription factors (TFs) of miR-140-5p were predicted in the analysis. Gene ontology (GO) enrichment analysis indicated that downstream target genes of miR-140-5p were enriched in many biological processes, such as biological regulation, signal transduction, response to chemical stimulus, stem cell proliferation, cell surface receptor signaling pathways. Kyoto Encyclopedia of Genes and Genome (KEGG) pathway analysis found that downstream target genes were mainly located in Notch, TGF-beta, PI3K/Akt, and Hippo signaling pathway. According to TF-miRNA-mRNA network, the important downstream target genes of miR-140-5p were PPI, TGF-betaR1, smad4, JAG1, ADAM10, FGF9, PDGFRA, VEGFA, LAMC1, TLR4, and CREB. After thoroughly reviewing published literature, we found that 23 target genes and seven signaling pathways were truly inhibited by miR-140-5p in various tissues or cells; most of these verified targets were in accordance with our present prediction. Other predicted targets still need further verification in vivo and in vitro .

  6. Prediction of the prognosis of breast cancer in routine histologic specimens using a simplified, low-cost gene expression signature

    DEFF Research Database (Denmark)

    Marcell, S.A.; Balazs, A.; Emese, A.

    2013-01-01

    Prediction of the prognosis of breast cancer in routine histologic specimens using a simplified, low-cost gene expression signature Background: Grade 2 breast carcinomas do not form a uniform prognostic group. Aim: To extend the number of patients and the investigated genes of a previously...... grade 2 breast carcinomas into prognostic groups. Gene expression was investigated by polymerase chain reaction in 249 formalin-fixed, paraffin-embedded breast tumors. The results were correlated with relapse-free survival. Results: Histologically grade 2 carcinomas were split into good and a poor...... identified prognostic signature described by the authors that reflect chromosomal instability in order to refine characterization of grade 2 breast cancers and identify driver genes. Methods: Using publicly available databases, the authors selected 9 target and 3 housekeeping genes that are capable to divide...

  7. Prediction of the contact sensitizing potential of chemicals using analysis of gene expression changes in human THP-1 monocytes.

    Science.gov (United States)

    Arkusz, Joanna; Stępnik, Maciej; Sobala, Wojciech; Dastych, Jarosław

    2010-11-10

    The aim of this study was to find differentially regulated genes in THP-1 monocytic cells exposed to sensitizers and nonsensitizers and to investigate if such genes could be reliable markers for an in vitro predictive method for the identification of skin sensitizing chemicals. Changes in expression of 35 genes in the THP-1 cell line following treatment with chemicals of different sensitizing potential (from nonsensitizers to extreme sensitizers) were assessed using real-time PCR. Verification of 13 candidate genes by testing a large number of chemicals (an additional 22 sensitizers and 8 nonsensitizers) revealed that prediction of contact sensitization potential was possible based on evaluation of changes in three genes: IL8, HMOX1 and PAIMP1. In total, changes in expression of these genes allowed correct detection of sensitization potential of 21 out of 27 (78%) test sensitizers. The gene expression levels inside potency groups varied and did not allow estimation of sensitization potency of test chemicals. Results of this study indicate that evaluation of changes in expression of proposed biomarkers in THP-1 cells could be a valuable model for preliminary screening of chemicals to discriminate an appreciable majority of sensitizers from nonsensitizers. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  8. Prediction and analysis of three gene families related to leaf rust (Puccinia triticina) resistance in wheat (Triticum aestivum L.).

    Science.gov (United States)

    Peng, Fred Y; Yang, Rong-Cai

    2017-06-20

    The resistance to leaf rust (Lr) caused by Puccinia triticina in wheat (Triticum aestivum L.) has been well studied over the past decades with over 70 Lr genes being mapped on different chromosomes and numerous QTLs (quantitative trait loci) being detected or mapped using DNA markers. Such resistance is often divided into race-specific and race-nonspecific resistance. The race-nonspecific resistance can be further divided into resistance to most or all races of the same pathogen and resistance to multiple pathogens. At the molecular level, these three types of resistance may cover across the whole spectrum of pathogen specificities that are controlled by genes encoding different protein families in wheat. The objective of this study is to predict and analyze genes in three such families: NBS-LRR (nucleotide-binding sites and leucine-rich repeats or NLR), START (Steroidogenic Acute Regulatory protein [STaR] related lipid-transfer) and ABC (ATP-Binding Cassette) transporter. The focus of the analysis is on the patterns of relationships between these protein-coding genes within the gene families and QTLs detected for leaf rust resistance. We predicted 526 ABC, 1117 NLR and 144 START genes in the hexaploid wheat genome through a domain analysis of wheat proteome. Of the 1809 SNPs from leaf rust resistance QTLs in seedling and adult stages of wheat, 126 SNPs were found within coding regions of these genes or their neighborhood (5 Kb upstream from transcription start site [TSS] or downstream from transcription termination site [TTS] of the genes). Forty-three of these SNPs for adult resistance and 18 SNPs for seedling resistance reside within coding or neighboring regions of the ABC genes whereas 14 SNPs for adult resistance and 29 SNPs for seedling resistance reside within coding or neighboring regions of the NLR gene. Moreover, we found 17 nonsynonymous SNPs for adult resistance and five SNPs for seedling resistance in the ABC genes, and five nonsynonymous SNPs for

  9. Predicting the use of corporal punishment: Child aggression, parent religiosity, and the BDNF gene.

    Science.gov (United States)

    Avinun, Reut; Davidov, Maayan; Mankuta, David; Knafo-Noam, Ariel

    2018-03-01

    Corporal punishment (CP) has been associated with deleterious child outcomes, highlighting the importance of understanding its underpinnings. Although several factors have been linked with parents' CP use, genetic influences on CP have rarely been studied, and an integrative view examining the interplay between different predictors of CP is missing. We focused on the separate and joint effects of religiosity, child aggression, parent's gender, and a valine (Val) to methionine (Met) substitution in the brain-derived neurotrophic factor (BDNF) gene. Data came from a twin sample (51% male, aged 6.5 years). We used mothers' and fathers' self-reports of CP and religiosity, and the other parent's report on child aggression. Complete data were available for 244 mothers and their 466 children, and for 217 fathers and their 409 children. The random split method was employed to examine replicability. For mothers, only the effect of religiosity appeared to replicate. For fathers, several effects predicting CP use replicated in both samples: child aggression, child sex, religiosity, and a three-way (GxExE) interaction implicating fathers' BDNF genotype, child aggression and religiosity. Religious fathers who carried the Met allele and had an aggressive child used CP more frequently; in contrast, secular fathers' CP use was not affected by their BDNF genotype or child aggression. Results were also repeated longitudinally in a subsample with age 8-9 data. Findings highlight the utility of a bio-ecological approach for studying CP use by shedding light on pertinent gene-environment interaction processes. Possible implications for intervention and public policy are discussed. © 2017 Wiley Periodicals, Inc.

  10. In Silico Analysis of Microarray-Based Gene Expression Profiles Predicts Tumor Cell Response to Withanolides

    Directory of Open Access Journals (Sweden)

    Thomas Efferth

    2012-05-01

    Full Text Available Withania somnifera (L. Dunal (Indian ginseng, winter cherry, Solanaceae is widely used in traditional medicine. Roots are either chewed or used to prepare beverages (aqueous decocts. The major secondary metabolites of Withania somnifera are the withanolides, which are C-28-steroidal lactone triterpenoids. Withania somnifera extracts exert chemopreventive and anticancer activities in vitro and in vivo. The aims of the present in silico study were, firstly, to investigate whether tumor cells develop cross-resistance between standard anticancer drugs and withanolides and, secondly, to elucidate the molecular determinants of sensitivity and resistance of tumor cells towards withanolides. Using IC50 concentrations of eight different withanolides (withaferin A, withaferin A diacetate, 3-azerininylwithaferin A, withafastuosin D diacetate, 4-B-hydroxy-withanolide E, isowithanololide E, withafastuosin E, and withaperuvin and 19 established anticancer drugs, we analyzed the cross-resistance profile of 60 tumor cell lines. The cell lines revealed cross-resistance between the eight withanolides. Consistent cross-resistance between withanolides and nitrosoureas (carmustin, lomustin, and semimustin was also observed. Then, we performed transcriptomic microarray-based COMPARE and hierarchical cluster analyses of mRNA expression to identify mRNA expression profiles predicting sensitivity or resistance towards withanolides. Genes from diverse functional groups were significantly associated with response of tumor cells to withaferin A diacetate, e.g. genes functioning in DNA damage and repair, stress response, cell growth regulation, extracellular matrix components, cell adhesion and cell migration, constituents of the ribosome, cytoskeletal organization and regulation, signal transduction, transcription factors, and others.

  11. A common polymorphism in a Williams syndrome gene predicts amygdala reactivity and extraversion in healthy adults

    Science.gov (United States)

    Swartz, Johnna R.; Waller, Rebecca; Bogdan, Ryan; Knodt, Annchen R.; Sabhlok, Aditi; Hyde, Luke W.; Hariri, Ahmad R.

    2015-01-01

    Background Williams syndrome (WS), a genetic disorder resulting from hemizygous microdeletion of chromosome 7q11.23, has emerged as a model for identifying the genetic architecture of socioemotional behavior. Recently, common polymorphisms in GTF2I, which is found within the WS microdeletion, have been associated with reduced social anxiety in the general population. Identifying neural phenotypes affected by these polymorphisms will help advance our understanding not only of this specific genetic association but also the broader neurogenetic mechanisms of variability in socioemotional behavior. Methods Through an ongoing parent protocol, the Duke Neurogenetics Study, we measured threat-related amygdala reactivity to fearful and angry facial expressions using functional MRI (fMRI), assessed trait personality using the Revised NEO Personality Inventory, and imputed GTF2I rs13227433 from saliva-derived DNA using custom Illumina arrays. Participants included 808 non-Hispanic Caucasian, African American, and Asian university students. Results The GTF2I rs13227433 AA genotype, previously associated with lower social anxiety, predicted decreased threat-related amygdala reactivity. An indirect effect of GTF2I genotype on the warmth facet of extraversion was mediated by decreased threat-related amygdala reactivity in women but not men. Conclusions A common polymorphism in the WS gene GTF2I associated with reduced social anxiety predicts decreased threat-related amygdala reactivity, which mediates an association between genotype and increased warmth in women. These results are consistent with reduced threat-related amygdala reactivity in WS and suggest that common variation in GTF2I contributes to broader variability in socioemotional brain function and behavior, with implications for understanding the neurogenetic bases of WS as well as social anxiety. PMID:26853120

  12. Interaction between serotonin transporter gene variants and life events predicts response to antidepressants in the GENDEP project

    DEFF Research Database (Denmark)

    Keers, R.; Uher, R.; Huezo-Diaz, P.

    2011-01-01

    , and several polymorphisms in the serotonin transporter gene (SLC6A4) have been genotyped including the serotonin transporter-linked polymorphic region (5-HTTLPR). Stressful life events were shown to predict a significantly better response to escitalopram but had no effect on response to nortriptyline...

  13. Interactions between Serotonin Transporter Gene Haplotypes and Quality of Mothers' Parenting Predict the Development of Children's Noncompliance

    Science.gov (United States)

    Sulik, Michael J.; Eisenberg, Nancy; Lemery-Chalfant, Kathryn; Spinrad, Tracy L.; Silva, Kassondra M.; Eggum, Natalie D.; Betkowski, Jennifer A.; Kupfer, Anne; Smith, Cynthia L.; Gaertner, Bridget; Stover, Daryn A.; Verrelli, Brian C.

    2012-01-01

    The LPR and STin2 polymorphisms of the serotonin transporter gene (SLC6A4) were combined into haplotypes that, together with quality of maternal parenting, were used to predict initial levels and linear change in children's (N = 138) noncompliance and aggression from age 18-54 months. Quality of mothers' parenting behavior was observed when…

  14. Arabidopsis CPR5 is a senescence-regulatory gene with pleiotropic functions as predicted by the evolutionary theory of senescence

    NARCIS (Netherlands)

    Jing, Hai-Chun; Anderson, Lisa; Sturre, Marcel J. G.; Hille, Jacques; Dijkwel, Paul P.

    2007-01-01

    Arabidopsis CPR5 is a senescence-regulatory gene with pleiotropic functions as predicted by the evolutionary theory of senescence Hai-Chun Jing1,2, Lisa Anderson3, Marcel J.G. Sturre1, Jacques Hille1 and Paul P. Dijkwel1,* 1Molecular Biology of Plants, Groningen Biomolecular Sciences and

  15. Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information

    Directory of Open Access Journals (Sweden)

    Lemke Ney

    2009-09-01

    Full Text Available Abstract Background The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. We therefore present here a machine learning-based computational approach relying on network topological features, cellular localization and biological process information for prediction of essential genes. Results We constructed a decision tree-based meta-classifier and trained it on datasets with individual and grouped attributes-network topological features, cellular compartments and biological processes-to generate various predictors of essential genes. We showed that the predictors with better performances are those generated by datasets with integrated attributes. Using the predictor with all attributes, i.e., network topological features, cellular compartments and biological processes, we obtained the best predictor of essential genes that was then used to classify yeast genes with unknown essentiality status. Finally, we generated decision trees by training the J48 algorithm on datasets with all network topological features, cellular localization and biological process information to discover cellular rules for essentiality. We found that the number of protein physical interactions, the nuclear localization of proteins and the number of regulating transcription factors are the most important factors determining gene essentiality. Conclusion We were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing

  16. SVM classifier to predict genes important for self-renewal and pluripotency of mouse embryonic stem cells

    Directory of Open Access Journals (Sweden)

    Xu Huilei

    2010-12-01

    Full Text Available Abstract Background Mouse embryonic stem cells (mESCs are derived from the inner cell mass of a developing blastocyst and can be cultured indefinitely in-vitro. Their distinct features are their ability to self-renew and to differentiate to all adult cell types. Genes that maintain mESCs self-renewal and pluripotency identity are of interest to stem cell biologists. Although significant steps have been made toward the identification and characterization of such genes, the list is still incomplete and controversial. For example, the overlap among candidate self-renewal and pluripotency genes across different RNAi screens is surprisingly small. Meanwhile, machine learning approaches have been used to analyze multi-dimensional experimental data and integrate results from many studies, yet they have not been applied to specifically tackle the task of predicting and classifying self-renewal and pluripotency gene membership. Results For this study we developed a classifier, a supervised machine learning framework for predicting self-renewal and pluripotency mESCs stemness membership genes (MSMG using support vector machines (SVM. The data used to train the classifier was derived from mESCs-related studies using mRNA microarrays, measuring gene expression in various stages of early differentiation, as well as ChIP-seq studies applied to mESCs profiling genome-wide binding of key transcription factors, such as Nanog, Oct4, and Sox2, to the regulatory regions of other genes. Comparison to other classification methods using the leave-one-out cross-validation method was employed to evaluate the accuracy and generality of the classification. Finally, two sets of candidate genes from genome-wide RNA interference screens are used to test the generality and potential application of the classifier. Conclusions Our results reveal that an SVM approach can be useful for prioritizing genes for functional validation experiments and complement the analyses of high

  17. Assessment of the predictive accuracy of five in silico prediction tools, alone or in combination, and two metaservers to classify long QT syndrome gene mutations.

    Science.gov (United States)

    Leong, Ivone U S; Stuckey, Alexander; Lai, Daniel; Skinner, Jonathan R; Love, Donald R

    2015-05-13

    Long QT syndrome (LQTS) is an autosomal dominant condition predisposing to sudden death from malignant arrhythmia. Genetic testing identifies many missense single nucleotide variants of uncertain pathogenicity. Establishing genetic pathogenicity is an essential prerequisite to family cascade screening. Many laboratories use in silico prediction tools, either alone or in combination, or metaservers, in order to predict pathogenicity; however, their accuracy in the context of LQTS is unknown. We evaluated the accuracy of five in silico programs and two metaservers in the analysis of LQTS 1-3 gene variants. The in silico tools SIFT, PolyPhen-2, PROVEAN, SNPs&GO and SNAP, either alone or in all possible combinations, and the metaservers Meta-SNP and PredictSNP, were tested on 312 KCNQ1, KCNH2 and SCN5A gene variants that have previously been characterised by either in vitro or co-segregation studies as either "pathogenic" (283) or "benign" (29). The accuracy, sensitivity, specificity and Matthews Correlation Coefficient (MCC) were calculated to determine the best combination of in silico tools for each LQTS gene, and when all genes are combined. The best combination of in silico tools for KCNQ1 is PROVEAN, SNPs&GO and SIFT (accuracy 92.7%, sensitivity 93.1%, specificity 100% and MCC 0.70). The best combination of in silico tools for KCNH2 is SIFT and PROVEAN or PROVEAN, SNPs&GO and SIFT. Both combinations have the same scores for accuracy (91.1%), sensitivity (91.5%), specificity (87.5%) and MCC (0.62). In the case of SCN5A, SNAP and PROVEAN provided the best combination (accuracy 81.4%, sensitivity 86.9%, specificity 50.0%, and MCC 0.32). When all three LQT genes are combined, SIFT, PROVEAN and SNAP is the combination with the best performance (accuracy 82.7%, sensitivity 83.0%, specificity 80.0%, and MCC 0.44). Both metaservers performed better than the single in silico tools; however, they did not perform better than the best performing combination of in silico

  18. Multi-gene genetic programming based predictive models for municipal solid waste gasification in a fluidized bed gasifier.

    Science.gov (United States)

    Pandey, Daya Shankar; Pan, Indranil; Das, Saptarshi; Leahy, James J; Kwapinski, Witold

    2015-03-01

    A multi-gene genetic programming technique is proposed as a new method to predict syngas yield production and the lower heating value for municipal solid waste gasification in a fluidized bed gasifier. The study shows that the predicted outputs of the municipal solid waste gasification process are in good agreement with the experimental dataset and also generalise well to validation (untrained) data. Published experimental datasets are used for model training and validation purposes. The results show the effectiveness of the genetic programming technique for solving complex nonlinear regression problems. The multi-gene genetic programming are also compared with a single-gene genetic programming model to show the relative merits and demerits of the technique. This study demonstrates that the genetic programming based data-driven modelling strategy can be a good candidate for developing models for other types of fuels as well. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. PRGPred: A platform for prediction of domains of resistance gene analogue (RGA in Arecaceae developed using machine learning algorithms

    Directory of Open Access Journals (Sweden)

    MATHODIYIL S. MANJULA

    2015-12-01

    Full Text Available Plant disease resistance genes (R-genes are responsible for initiation of defense mechanism against various phytopathogens. The majority of plant R-genes are members of very large multi-gene families, which encode structurally related proteins containing nucleotide binding site domains (NBS and C-terminal leucine rich repeats (LRR. Other classes possess' an extracellular LRR domain, a transmembrane domain and sometimes, an intracellular serine/threonine kinase domain. R-proteins work in pathogen perception and/or the activation of conserved defense signaling networks. In the present study, sequences representing resistance gene analogues (RGAs of coconut, arecanut, oil palm and date palm were collected from NCBI, sorted based on domains and assembled into a database. The sequences were analyzed in PRINTS database to find out the conserved domains and their motifs present in the RGAs. Based on these domains, we have also developed a tool to predict the domains of palm R-genes using various machine learning algorithms. The model files were selected based on the performance of the best classifier in training and testing. All these information is stored and made available in the online ‘PRGpred' database and prediction tool.

  20. No specific gene expression signature in human granulosa and cumulus cells for prediction of oocyte fertilisation and embryo implantation.

    Directory of Open Access Journals (Sweden)

    Tanja Burnik Papler

    Full Text Available In human IVF procedures objective and reliable biomarkers of oocyte and embryo quality are needed in order to increase the use of single embryo transfer (SET and thus prevent multiple pregnancies. During folliculogenesis there is an intense bi-directional communication between oocyte and follicular cells. For this reason gene expression profile of follicular cells could be an important indicator and biomarker of oocyte and embryo quality. The objective of this study was to identify gene expression signature(s in human granulosa (GC and cumulus (CC cells predictive of successful embryo implantation and oocyte fertilization. Forty-one patients were included in the study and individual GC and CC samples were collected; oocytes were cultivated separately, allowing a correlation with IVF outcome and elective SET was performed. Gene expression analysis was performed using microarrays, followed by a quantitative real-time PCR validation. After statistical analysis of microarray data, there were no significantly differentially expressed genes (FDR<0,05 between non-fertilized and fertilized oocytes and non-implanted and implanted embryos in either of the cell type. Furthermore, the results of quantitative real-time PCR were in consent with microarray data as there were no significant differences in gene expression of genes selected for validation. In conclusion, we did not find biomarkers for prediction of oocyte fertilization and embryo implantation in IVF procedures in the present study.

  1. DNA methylation of the oxytocin receptor gene predicts neural response to ambiguous social stimuli

    Directory of Open Access Journals (Sweden)

    Allison eJack

    2012-10-01

    Full Text Available Oxytocin and its receptor (OXTR play an important role in a variety of social perceptual and affiliative processes. Individual variability in social information processing likely has a strong heritable component, and as such, many investigations have established an association between common genetic variants of OXTR and variability in the social phenotype. However, to date, these investigations have primarily focused only on changes in the sequence of DNA without considering the role of epigenetic factors. DNA methylation is an epigenetic mechanism by which cells control transcription through modification of chromatin structure. DNA methylation of OXTR decreases expression of the gene and high levels of methylation have been associated with autism spectrum disorders. This link between epigenetic variability and social phenotype allows for the possibility that social processes are under epigenetic control. We hypothesized that the level of DNA methylation of OXTR would predict individual variability in social perception. Using the brain’s sensitivity to displays of animacy as a neural endophenotype of social perception, we found significant associations between the degree of OXTR methylation and brain activity evoked by the perception of animacy. Our results suggest that consideration of DNA methylation may substantially improve our ability to explain individual differences in imaging genetic association studies.

  2. Predicting human miRNA target genes using a novel evolutionary methodology

    KAUST Repository

    Aigli, Korfiati; Kleftogiannis, Dimitrios A.; Konstantinos, Theofilatos; Spiros, Likothanassis; Athanasios, Tsakalidis; Seferina, Mavroudi

    2012-01-01

    The discovery of miRNAs had great impacts on traditional biology. Typically, miRNAs have the potential to bind to the 3'untraslated region (UTR) of their mRNA target genes for cleavage or translational repression. The experimental identification of their targets has many drawbacks including cost, time and low specificity and these are the reasons why many computational approaches have been developed so far. However, existing computational approaches do not include any advanced feature selection technique and they are facing problems concerning their classification performance and their interpretability. In the present paper, we propose a novel hybrid methodology which combines genetic algorithms and support vector machines in order to locate the optimal feature subset while achieving high classification performance. The proposed methodology was compared with two of the most promising existing methodologies in the problem of predicting human miRNA targets. Our approach outperforms existing methodologies in terms of classification performances while selecting a much smaller feature subset. © 2012 Springer-Verlag.

  3. Predicting human miRNA target genes using a novel evolutionary methodology

    KAUST Repository

    Aigli, Korfiati

    2012-01-01

    The discovery of miRNAs had great impacts on traditional biology. Typically, miRNAs have the potential to bind to the 3\\'untraslated region (UTR) of their mRNA target genes for cleavage or translational repression. The experimental identification of their targets has many drawbacks including cost, time and low specificity and these are the reasons why many computational approaches have been developed so far. However, existing computational approaches do not include any advanced feature selection technique and they are facing problems concerning their classification performance and their interpretability. In the present paper, we propose a novel hybrid methodology which combines genetic algorithms and support vector machines in order to locate the optimal feature subset while achieving high classification performance. The proposed methodology was compared with two of the most promising existing methodologies in the problem of predicting human miRNA targets. Our approach outperforms existing methodologies in terms of classification performances while selecting a much smaller feature subset. © 2012 Springer-Verlag.

  4. Optimized outcome prediction in breast cancer by combining the 70-gene signature with clinical risk prediction algorithms

    NARCIS (Netherlands)

    Drukker, C.A.; Nijenhuis, M.V.; Bueno de Mesquita, J.M.; Retel, V.P.; Retel, Valesca; van Harten, Willem H.; van Tinteren, H.; Wesseling, J.; Schmidt, M.K.; van 't Veer, L.J.; Sonke, G.S.; Rutgers, E.J.T.; van de Vijver, M.J.; Linn, S.C.

    2014-01-01

    Clinical guidelines for breast cancer treatment differ in their selection of patients at a high risk of recurrence who are eligible to receive adjuvant systemic treatment (AST). The 70-gene signature is a molecular tool to better guide AST decisions. The aim of this study was to evaluate whether

  5. Predicting response to primary chemotherapy: gene expression profiling of paraffin-embedded core biopsy tissue.

    Science.gov (United States)

    Mina, Lida; Soule, Sharon E; Badve, Sunil; Baehner, Fredrick L; Baker, Joffre; Cronin, Maureen; Watson, Drew; Liu, Mei-Lan; Sledge, George W; Shak, Steve; Miller, Kathy D

    2007-06-01

    Primary chemotherapy provides an ideal opportunity to correlate gene expression with response to treatment. We used paraffin-embedded core biopsies from a completed phase II trial to identify genes that correlate with response to primary chemotherapy. Patients with newly diagnosed stage II or III breast cancer were treated with sequential doxorubicin 75 mg/M2 q2 wks x 3 and docetaxel 40 mg/M2 weekly x 6; treatment order was randomly assigned. Pretreatment core biopsy samples were interrogated for genes that might correlate with pathologic complete response (pCR). In addition to the individual genes, the correlation of the Oncotype DX Recurrence Score with pCR was examined. Of 70 patients enrolled in the parent trial, core biopsies samples with sufficient RNA for gene analyses were available from 45 patients; 9 (20%) had inflammatory breast cancer (IBC). Six (14%) patients achieved a pCR. Twenty-two of the 274 candidate genes assessed correlated with pCR (p < 0.05). Genes correlating with pCR could be grouped into three large clusters: angiogenesis-related genes, proliferation related genes, and invasion-related genes. Expression of estrogen receptor (ER)-related genes and Recurrence Score did not correlate with pCR. In an exploratory analysis we compared gene expression in IBC to non-inflammatory breast cancer; twenty-four (9%) of the genes were differentially expressed (p < 0.05), 5 were upregulated and 19 were downregulated in IBC. Gene expression analysis on core biopsy samples is feasible and identifies candidate genes that correlate with pCR to primary chemotherapy. Gene expression in IBC differs significantly from noninflammatory breast cancer.

  6. Use of tiling array data and RNA secondary structure predictions to identify noncoding RNA genes

    DEFF Research Database (Denmark)

    Weile, Christian; Gardner, Paul P; Hedegaard, Mads M

    2007-01-01

    neuroblastoma cell line SK-N-AS. Using this strategy, we identify thousands of human candidate RNA genes. To further verify the expression of these genes, we focused on candidate genes that had a stable hairpin structures or a high level of covariance. Using northern blotting, we verify the expression of 2 out...

  7. ColoLipidGene: signature of lipid metabolism-related genes to predict prognosis in stage-II colon cancer patients

    Science.gov (United States)

    Vargas, Teodoro; Moreno-Rubio, Juan; Herranz, Jesús; Cejas, Paloma; Molina, Susana; González-Vallinas, Margarita; Mendiola, Marta; Burgos, Emilio; Aguayo, Cristina; Custodio, Ana B.; Machado, Isidro; Ramos, David; Gironella, Meritxell; Espinosa-Salinas, Isabel; Ramos, Ricardo; Martín-Hernández, Roberto; Risueño, Alberto; De Las Rivas, Javier; Reglero, Guillermo; Yaya, Ricardo; Fernández-Martos, Carlos; Aparicio, Jorge; Maurel, Joan; Feliu, Jaime; de Molina, Ana Ramírez

    2015-01-01

    Lipid metabolism plays an essential role in carcinogenesis due to the requirements of tumoral cells to sustain increased structural, energetic and biosynthetic precursor demands for cell proliferation. We investigated the association between expression of lipid metabolism-related genes and clinical outcome in intermediate-stage colon cancer patients with the aim of identifying a metabolic profile associated with greater malignancy and increased risk of relapse. Expression profile of 70 lipid metabolism-related genes was determined in 77 patients with stage II colon cancer. Cox regression analyses using c-index methodology was applied to identify a metabolic-related signature associated to prognosis. The metabolic signature was further confirmed in two independent validation sets of 120 patients and additionally, in a group of 264 patients from a public database. The combined analysis of these 4 genes, ABCA1, ACSL1, AGPAT1 and SCD, constitutes a metabolic-signature (ColoLipidGene) able to accurately stratify stage II colon cancer patients with 5-fold higher risk of relapse with strong statistical power in the four independent groups of patients. The identification of a group of 4 genes that predict survival in intermediate-stage colon cancer patients allows delineation of a high-risk group that may benefit from adjuvant therapy, and avoids the toxic and unnecessary chemotherapy in patients classified as low-risk group. PMID:25749516

  8. In Silico Prediction of Horizontal Gene Transfer Events in Lactobacillus bulgaricus and Streptococcus thermophilus Reveals Protocooperation in Yogurt Manufacturing▿ †

    Science.gov (United States)

    Liu, Mengjin; Siezen, Roland J.; Nauta, Arjen

    2009-01-01

    Lactobacillus bulgaricus and Streptococcus thermophilus, used in yogurt starter cultures, are well known for their stability and protocooperation during their coexistence in milk. In this study, we show that a close interaction between the two species also takes place at the genetic level. We performed an in silico analysis, combining gene composition and gene transfer mechanism-associated features, and predicted horizontally transferred genes in both L. bulgaricus and S. thermophilus. Putative horizontal gene transfer (HGT) events that have occurred between the two bacterial species include the transfer of exopolysaccharide (EPS) biosynthesis genes, transferred from S. thermophilus to L. bulgaricus, and the gene cluster cbs-cblB(cglB)-cysE for the metabolism of sulfur-containing amino acids, transferred from L. bulgaricus or Lactobacillus helveticus to S. thermophilus. The HGT event for the cbs-cblB(cglB)-cysE gene cluster was analyzed in detail, with respect to both evolutionary and functional aspects. It can be concluded that during the coexistence of both yogurt starter species in a milk environment, agonistic coevolution at the genetic level has probably been involved in the optimization of their combined growth and interactions. PMID:19395564

  9. In silico prediction of horizontal gene transfer events in Lactobacillus bulgaricus and Streptococcus thermophilus reveals protocooperation in yogurt manufacturing.

    Science.gov (United States)

    Liu, Mengjin; Siezen, Roland J; Nauta, Arjen

    2009-06-01

    Lactobacillus bulgaricus and Streptococcus thermophilus, used in yogurt starter cultures, are well known for their stability and protocooperation during their coexistence in milk. In this study, we show that a close interaction between the two species also takes place at the genetic level. We performed an in silico analysis, combining gene composition and gene transfer mechanism-associated features, and predicted horizontally transferred genes in both L. bulgaricus and S. thermophilus. Putative horizontal gene transfer (HGT) events that have occurred between the two bacterial species include the transfer of exopolysaccharide (EPS) biosynthesis genes, transferred from S. thermophilus to L. bulgaricus, and the gene cluster cbs-cblB(cglB)-cysE for the metabolism of sulfur-containing amino acids, transferred from L. bulgaricus or Lactobacillus helveticus to S. thermophilus. The HGT event for the cbs-cblB(cglB)-cysE gene cluster was analyzed in detail, with respect to both evolutionary and functional aspects. It can be concluded that during the coexistence of both yogurt starter species in a milk environment, agonistic coevolution at the genetic level has probably been involved in the optimization of their combined growth and interactions.

  10. Predicting survival in patients with metastatic kidney cancer by gene-expression profiling in the primary tumor.

    Science.gov (United States)

    Vasselli, James R; Shih, Joanna H; Iyengar, Shuba R; Maranchie, Jodi; Riss, Joseph; Worrell, Robert; Torres-Cabala, Carlos; Tabios, Ray; Mariotti, Andra; Stearman, Robert; Merino, Maria; Walther, McClellan M; Simon, Richard; Klausner, Richard D; Linehan, W Marston

    2003-06-10

    To identify potential molecular determinants of tumor biology and possible clinical outcomes, global gene-expression patterns were analyzed in the primary tumors of patients with metastatic renal cell cancer by using cDNA microarrays. We used grossly dissected tumor masses that included tumor, blood vessels, connective tissue, and infiltrating immune cells to obtain a gene-expression "profile" from each primary tumor. Two patterns of gene expression were found within this uniformly staged patient population, which correlated with a significant difference in overall survival between the two patient groups. Subsets of genes most significantly associated with survival were defined, and vascular cell adhesion molecule-1 (VCAM-1) was the gene most predictive for survival. Therefore, despite the complex biological nature of metastatic cancer, basic clinical behavior as defined by survival may be determined by the gene-expression patterns expressed within the compilation of primary gross tumor cells. We conclude that survival in patients with metastatic renal cell cancer can be correlated with the expression of various genes based solely on the expression profile in the primary kidney tumor.

  11. The cure: design and evaluation of a crowdsourcing game for gene selection for breast cancer survival prediction.

    Science.gov (United States)

    Good, Benjamin M; Loguercio, Salvatore; Griffith, Obi L; Nanis, Max; Wu, Chunlei; Su, Andrew I

    2014-07-29

    Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility, and biological interpretability. Methods that take advantage of structured prior knowledge (eg, protein interaction networks) show promise in helping to define better signatures, but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes unheard of before. The main objective of this study was to test the hypothesis that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from players of an open, Web-based game. We envisioned capturing knowledge both from the player's prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game. We developed and evaluated an online game called The Cure that captured information from players regarding genes for use as predictors of breast cancer survival. Information gathered from game play was aggregated using a voting approach, and used to create rankings of genes. The top genes from these rankings were evaluated using annotation enrichment analysis, comparison to prior predictor gene sets, and by using them to train and test machine learning systems for predicting 10 year survival. Between its launch in September 2012 and September 2013, The Cure attracted more than 1000 registered players, who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data showed significant enrichment for genes known to be related to key concepts such as cancer, disease progression, and recurrence. In terms of the predictive accuracy of models trained using this

  12. Genomic instability of osteosarcoma cell lines in culture: impact on the prediction of metastasis relevant genes.

    Science.gov (United States)

    Muff, Roman; Rath, Prisni; Ram Kumar, Ram Mohan; Husmann, Knut; Born, Walter; Baudis, Michael; Fuchs, Bruno

    2015-01-01

    Osteosarcoma is a rare but highly malignant cancer of the bone. As a consequence, the number of established cell lines used for experimental in vitro and in vivo osteosarcoma research is limited and the value of these cell lines relies on their stability during culture. Here we investigated the stability in gene expression by microarray analysis and array genomic hybridization of three low metastatic cell lines and derivatives thereof with increased metastatic potential using cells of different passages. The osteosarcoma cell lines showed altered gene expression during in vitro culture, and it was more pronounced in two metastatic cell lines compared to the respective parental cells. Chromosomal instability contributed in part to the altered gene expression in SAOS and LM5 cells with low and high metastatic potential. To identify metastasis-relevant genes in a background of passage-dependent altered gene expression, genes involved in "Pathways in cancer" that were consistently regulated under all passage comparisons were evaluated. Genes belonging to "Hedgehog signaling pathway" and "Wnt signaling pathway" were significantly up-regulated, and IHH, WNT10B and TCF7 were found up-regulated in all three metastatic compared to the parental cell lines. Considerable instability during culture in terms of gene expression and chromosomal aberrations was observed in osteosarcoma cell lines. The use of cells from different passages and a search for genes consistently regulated in early and late passages allows the analysis of metastasis-relevant genes despite the observed instability in gene expression in osteosarcoma cell lines during culture.

  13. Accurate prediction of the functional significance of single nucleotide polymorphisms and mutations in the ABCA1 gene.

    Directory of Open Access Journals (Sweden)

    Liam R Brunham

    2005-12-01

    Full Text Available The human genome contains an estimated 100,000 to 300,000 DNA variants that alter an amino acid in an encoded protein. However, our ability to predict which of these variants are functionally significant is limited. We used a bioinformatics approach to define the functional significance of genetic variation in the ABCA1 gene, a cholesterol transporter crucial for the metabolism of high density lipoprotein cholesterol. To predict the functional consequence of each coding single nucleotide polymorphism and mutation in this gene, we calculated a substitution position-specific evolutionary conservation score for each variant, which considers site-specific variation among evolutionarily related proteins. To test the bioinformatics predictions experimentally, we evaluated the biochemical consequence of these sequence variants by examining the ability of cell lines stably transfected with the ABCA1 alleles to elicit cholesterol efflux. Our bioinformatics approach correctly predicted the functional impact of greater than 94% of the naturally occurring variants we assessed. The bioinformatics predictions were significantly correlated with the degree of functional impairment of ABCA1 mutations (r2 = 0.62, p = 0.0008. These results have allowed us to define the impact of genetic variation on ABCA1 function and to suggest that the in silico evolutionary approach we used may be a useful tool in general for predicting the effects of DNA variation on gene function. In addition, our data suggest that considering patterns of positive selection, along with patterns of negative selection such as evolutionary conservation, may improve our ability to predict the functional effects of amino acid variation.

  14. An 80-gene set to predict response to preoperative chemoradiotherapy for rectal cancer by principle component analysis.

    Science.gov (United States)

    Empuku, Shinichiro; Nakajima, Kentaro; Akagi, Tomonori; Kaneko, Kunihiko; Hijiya, Naoki; Etoh, Tsuyoshi; Shiraishi, Norio; Moriyama, Masatsugu; Inomata, Masafumi

    2016-05-01

    Preoperative chemoradiotherapy (CRT) for locally advanced rectal cancer not only improves the postoperative local control rate, but also induces downstaging. However, it has not been established how to individually select patients who receive effective preoperative CRT. The aim of this study was to identify a predictor of response to preoperative CRT for locally advanced rectal cancer. This study is additional to our multicenter phase II study evaluating the safety and efficacy of preoperative CRT using oral fluorouracil (UMIN ID: 03396). From April, 2009 to August, 2011, 26 biopsy specimens obtained prior to CRT were analyzed by cyclopedic microarray analysis. Response to CRT was evaluated according to a histological grading system using surgically resected specimens. To decide on the number of genes for dividing into responder and non-responder groups, we statistically analyzed the data using a dimension reduction method, a principle component analysis. Of the 26 cases, 11 were responders and 15 non-responders. No significant difference was found in clinical background data between the two groups. We determined that the optimal number of genes for the prediction of response was 80 of 40,000 and the functions of these genes were analyzed. When comparing non-responders with responders, genes expressed at a high level functioned in alternative splicing, whereas those expressed at a low level functioned in the septin complex. Thus, an 80-gene expression set that predicts response to preoperative CRT for locally advanced rectal cancer was identified using a novel statistical method.

  15. Common and rare variants in the exons and regulatory regions of osteoporosis-related genes improve osteoporotic fracture risk prediction.

    Science.gov (United States)

    Lee, Seung Hun; Kang, Moo Il; Ahn, Seong Hee; Lim, Kyeong-Hye; Lee, Gun Eui; Shin, Eun-Soon; Lee, Jong-Eun; Kim, Beom-Jun; Cho, Eun-Hee; Kim, Sang-Wook; Kim, Tae-Ho; Kim, Hyun-Ju; Yoon, Kun-Ho; Lee, Won Chul; Kim, Ghi Su; Koh, Jung-Min; Kim, Shin-Yoon

    2014-11-01

    Osteoporotic fracture risk is highly heritable, but genome-wide association studies have explained only a small proportion of the heritability to date. Genetic data may improve prediction of fracture risk in osteopenic subjects and assist early intervention and management. To detect common and rare variants in coding and regulatory regions related to osteoporosis-related traits, and to investigate whether genetic profiling improves the prediction of fracture risk. This cross-sectional study was conducted in three clinical units in Korea. Postmenopausal women with extreme phenotypes (n = 982) were used for the discovery set, and 3895 participants were used for the replication set. We performed targeted resequencing of 198 genes. Genetic risk scores from common variants (GRS-C) and from common and rare variants (GRS-T) were calculated. Nineteen common variants in 17 genes (of the discovered 34 functional variants in 26 genes) and 31 rare variants in five genes (of the discovered 87 functional variants in 15 genes) were associated with one or more osteoporosis-related traits. Accuracy of fracture risk classification was improved in the osteopenic patients by adding GRS-C to fracture risk assessment models (6.8%; P risk in an osteopenic individual.

  16. Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy

    Directory of Open Access Journals (Sweden)

    Chi-Cheng Huang

    2013-01-01

    Full Text Available Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n=535. The agreement between PAM50 centroid-based single sample prediction (SSP and PLS-regression was excellent (weighted Kappa: 0.988 within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed. Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.

  17. Construction of a novel multi-gene assay (42-gene classifier) for prediction of late recurrence in ER-positive breast cancer patients.

    Science.gov (United States)

    Tsunashima, Ryo; Naoi, Yasuto; Shimazu, Kenzo; Kagara, Naofumi; Shimoda, Masashi; Tanei, Tomonori; Miyake, Tomohiro; Kim, Seung Jin; Noguchi, Shinzaburo

    2018-05-04

    Prediction models for late (> 5 years) recurrence in ER-positive breast cancer need to be developed for the accurate selection of patients for extended hormonal therapy. We attempted to develop such a prediction model focusing on the differences in gene expression between breast cancers with early and late recurrence. For the training set, 779 ER-positive breast cancers treated with tamoxifen alone for 5 years were selected from the databases (GSE6532, GSE12093, GSE17705, and GSE26971). For the validation set, 221 ER-positive breast cancers treated with adjuvant hormonal therapy for 5 years with or without chemotherapy at our hospital were included. Gene expression was assayed by DNA microarray analysis (Affymetrix U133 plus 2.0). With the 42 genes differentially expressed in early and late recurrence breast cancers in the training set, a prediction model (42GC) for late recurrence was constructed. The patients classified by 42GC into the late recurrence-like group showed a significantly (P = 0.006) higher late recurrence rate as expected but a significantly (P = 1.62 × E-13) lower rate for early recurrence than non-late recurrence-like group. These observations were confirmed for the validation set, i.e., P = 0.020 for late recurrence and P = 5.70 × E-5 for early recurrence. We developed a unique prediction model (42GC) for late recurrence by focusing on the biological differences between breast cancers with early and late recurrence. Interestingly, patients in the late recurrence-like group by 42GC were at low risk for early recurrence.

  18. Prediction of disease-related genes based on weighted tissue-specific networks by using DNA methylation.

    Science.gov (United States)

    Li, Min; Zhang, Jiayi; Liu, Qing; Wang, Jianxin; Wu, Fang-Xiang

    2014-01-01

    Predicting disease-related genes is one of the most important tasks in bioinformatics and systems biology. With the advances in high-throughput techniques, a large number of protein-protein interactions are available, which make it possible to identify disease-related genes at the network level. However, network-based identification of disease-related genes is still a challenge as the considerable false-positives are still existed in the current available protein interaction networks (PIN). Considering the fact that the majority of genetic disorders tend to manifest only in a single or a few tissues, we constructed tissue-specific networks (TSN) by integrating PIN and tissue-specific data. We further weighed the constructed tissue-specific network (WTSN) by using DNA methylation as it plays an irreplaceable role in the development of complex diseases. A PageRank-based method was developed to identify disease-related genes from the constructed networks. To validate the effectiveness of the proposed method, we constructed PIN, weighted PIN (WPIN), TSN, WTSN for colon cancer and leukemia, respectively. The experimental results on colon cancer and leukemia show that the combination of tissue-specific data and DNA methylation can help to identify disease-related genes more accurately. Moreover, the PageRank-based method was effective to predict disease-related genes on the case studies of colon cancer and leukemia. Tissue-specific data and DNA methylation are two important factors to the study of human diseases. The same method implemented on the WTSN can achieve better results compared to those being implemented on original PIN, WPIN, or TSN. The PageRank-based method outperforms degree centrality-based method for identifying disease-related genes from WTSN.

  19. Hierarchy in gene expression is predictive of risk, progression, and outcome in adult acute myeloid leukemia

    Science.gov (United States)

    Tripathi, Shubham; Deem, Michael W.

    2015-02-01

    Cancer progresses with a change in the structure of the gene network in normal cells. We define a measure of organizational hierarchy in gene networks of affected cells in adult acute myeloid leukemia (AML) patients. With a retrospective cohort analysis based on the gene expression profiles of 116 AML patients, we find that the likelihood of future cancer relapse and the level of clinical risk are directly correlated with the level of organization in the cancer related gene network. We also explore the variation of the level of organization in the gene network with cancer progression. We find that this variation is non-monotonic, which implies the fitness landscape in the evolution of AML cancer cells is non-trivial. We further find that the hierarchy in gene expression at the time of diagnosis may be a useful biomarker in AML prognosis.

  20. Hierarchy in gene expression is predictive of risk, progression, and outcome in adult acute myeloid leukemia

    International Nuclear Information System (INIS)

    Tripathi, Shubham; Deem, Michael W

    2015-01-01

    Cancer progresses with a change in the structure of the gene network in normal cells. We define a measure of organizational hierarchy in gene networks of affected cells in adult acute myeloid leukemia (AML) patients. With a retrospective cohort analysis based on the gene expression profiles of 116 AML patients, we find that the likelihood of future cancer relapse and the level of clinical risk are directly correlated with the level of organization in the cancer related gene network. We also explore the variation of the level of organization in the gene network with cancer progression. We find that this variation is non-monotonic, which implies the fitness landscape in the evolution of AML cancer cells is non-trivial. We further find that the hierarchy in gene expression at the time of diagnosis may be a useful biomarker in AML prognosis. (paper)

  1. A Predictive Coexpression Network Identifies Novel Genes Controlling the Seed-to-Seedling Phase Transition in Arabidopsis thaliana.

    Science.gov (United States)

    Silva, Anderson Tadeu; Ribone, Pamela A; Chan, Raquel L; Ligterink, Wilco; Hilhorst, Henk W M

    2016-04-01

    The transition from a quiescent dry seed to an actively growing photoautotrophic seedling is a complex and crucial trait for plant propagation. This study provides a detailed description of global gene expression in seven successive developmental stages of seedling establishment in Arabidopsis (Arabidopsis thaliana). Using the transcriptome signature from these developmental stages, we obtained a coexpression gene network that highlights interactions between known regulators of the seed-to-seedling transition and predicts the functions of uncharacterized genes in seedling establishment. The coexpressed gene data sets together with the transcriptional module indicate biological functions related to seedling establishment. Characterization of the homeodomain leucine zipper I transcription factor AtHB13, which is expressed during the seed-to-seedling transition, demonstrated that this gene regulates some of the network nodes and affects late seedling establishment. Knockout mutants for athb13 showed increased primary root length as compared with wild-type (Columbia-0) seedlings, suggesting that this transcription factor is a negative regulator of early root growth, possibly repressing cell division and/or cell elongation or the length of time that cells elongate. The signal transduction pathways present during the early phases of the seed-to-seedling transition anticipate the control of important events for a vigorous seedling, such as root growth. This study demonstrates that a gene coexpression network together with transcriptional modules can provide insights that are not derived from comparative transcript profiling alone. © 2016 American Society of Plant Biologists. All Rights Reserved.

  2. Efficient CRISPR/Cas9-Mediated Versatile, Predictable, and Donor-Free Gene Knockout in Human Pluripotent Stem Cells.

    Science.gov (United States)

    Liu, Zhongliang; Hui, Yi; Shi, Lei; Chen, Zhenyu; Xu, Xiangjie; Chi, Liankai; Fan, Beibei; Fang, Yujiang; Liu, Yang; Ma, Lin; Wang, Yiran; Xiao, Lei; Zhang, Quanbin; Jin, Guohua; Liu, Ling; Zhang, Xiaoqing

    2016-09-13

    Loss-of-function studies in human pluripotent stem cells (hPSCs) require efficient methodologies for lesion of genes of interest. Here, we introduce a donor-free paired gRNA-guided CRISPR/Cas9 knockout strategy (paired-KO) for efficient and rapid gene ablation in hPSCs. Through paired-KO, we succeeded in targeting all genes of interest with high biallelic targeting efficiencies. More importantly, during paired-KO, the cleaved DNA was repaired mostly through direct end joining without insertions/deletions (precise ligation), and thus makes the lesion product predictable. The paired-KO remained highly efficient for one-step targeting of multiple genes and was also efficient for targeting of microRNA, while for long non-coding RNA over 8 kb, cleavage of a short fragment of the core promoter region was sufficient to eradicate downstream gene transcription. This work suggests that the paired-KO strategy is a simple and robust system for loss-of-function studies for both coding and non-coding genes in hPSCs. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  3. Tumour gene expression predicts response to cetuximab in patients with KRAS wild-type metastatic colorectal cancer.

    Science.gov (United States)

    Baker, J B; Dutta, D; Watson, D; Maddala, T; Munneke, B M; Shak, S; Rowinsky, E K; Xu, L-A; Harbison, C T; Clark, E A; Mauro, D J; Khambata-Ford, S

    2011-02-01

    Although it is accepted that metastatic colorectal cancers (mCRCs) that carry activating mutations in KRAS are unresponsive to anti-epidermal growth factor receptor (EGFR) monoclonal antibodies, a significant fraction of KRAS wild-type (wt) mCRCs are also unresponsive to anti-EGFR therapy. Genes encoding EGFR ligands amphiregulin (AREG) and epiregulin (EREG) are promising gene expression-based markers but have not been incorporated into a test to dichotomise KRAS wt mCRC patients with respect to sensitivity to anti-EGFR treatment. We used RT-PCR to test 110 candidate gene expression markers in primary tumours from 144 KRAS wt mCRC patients who received monotherapy with the anti-EGFR antibody cetuximab. Results were correlated with multiple clinical endpoints: disease control, objective response, and progression-free survival (PFS). Expression of many of the tested candidate genes, including EREG and AREG, strongly associate with all clinical endpoints. Using multivariate analysis with two-layer five-fold cross-validation, we constructed a four-gene predictive classifier. Strikingly, patients below the classifier cutpoint had PFS and disease control rates similar to those of patients with KRAS mutant mCRC. Gene expression appears to identify KRAS wt mCRC patients who receive little benefit from cetuximab. It will be important to test this model in an independent validation study.

  4. Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering

    Directory of Open Access Journals (Sweden)

    Sharma Animesh

    2007-01-01

    Full Text Available Abstract Background The four heterogeneous childhood cancers, neuroblastoma, non-Hodgkin lymphoma, rhabdomyosarcoma, and Ewing sarcoma present a similar histology of small round blue cell tumor (SRBCT and thus often leads to misdiagnosis. Identification of biomarkers for distinguishing these cancers is a well studied problem. Existing methods typically evaluate each gene separately and do not take into account the nonlinear interaction between genes and the tools that are used to design the diagnostic prediction system. Consequently, more genes are usually identified as necessary for prediction. We propose a general scheme for finding a small set of biomarkers to design a diagnostic system for accurate classification of the cancer subgroups. We use multilayer networks with online gene selection ability and relational fuzzy clustering to identify a small set of biomarkers for accurate classification of the training and blind test cases of a well studied data set. Results Our method discerned just seven biomarkers that precisely categorized the four subgroups of cancer both in training and blind samples. For the same problem, others suggested 19–94 genes. These seven biomarkers include three novel genes (NAB2, LSP1 and EHD1 – not identified by others with distinct class-specific signatures and important role in cancer biology, including cellular proliferation, transendothelial migration and trafficking of MHC class antigens. Interestingly, NAB2 is downregulated in other tumors including Non-Hodgkin lymphoma and Neuroblastoma but we observed moderate to high upregulation in a few cases of Ewing sarcoma and Rabhdomyosarcoma, suggesting that NAB2 might be mutated in these tumors. These genes can discover the subgroups correctly with unsupervised learning, can differentiate non-SRBCT samples and they perform equally well with other machine learning tools including support vector machines. These biomarkers lead to four simple human interpretable

  5. Predicting Recurrence and Progression of Noninvasive Papillary Bladder Cancer at Initial Presentation Based on Quantitative Gene Expression Profiles

    DEFF Research Database (Denmark)

    Birkhahn, M.; Mitra, A.P.; Williams, Johan

    2010-01-01

    % specificity. Since this is a small retrospective study using medium-throughput profiling, larger confirmatory studies are needed. Conclusions: Gene expression profiling across relevant cancer pathways appears to be a promising approach for Ta bladder tumor outcome prediction at initial diagnosis......Background: Currently, tumor grade is the best predictor of outcome at first presentation of noninvasive papillary (Ta) bladder cancer. However, reliable predictors of Ta tumor recurrence and progression for individual patients, which could optimize treatment and follow-up schedules based...... on specific tumor biology, are yet to be identified. Objective: To identify genes predictive for recurrence and progression in Ta bladder cancer at first presentation using a quantitative, pathway-specific approach. Design, setting, and participants: Retrospective study of patients with Ta G2/3 bladder tumors...

  6. Genomic instability of osteosarcoma cell lines in culture: impact on the prediction of metastasis relevant genes.

    Directory of Open Access Journals (Sweden)

    Roman Muff

    Full Text Available Osteosarcoma is a rare but highly malignant cancer of the bone. As a consequence, the number of established cell lines used for experimental in vitro and in vivo osteosarcoma research is limited and the value of these cell lines relies on their stability during culture. Here we investigated the stability in gene expression by microarray analysis and array genomic hybridization of three low metastatic cell lines and derivatives thereof with increased metastatic potential using cells of different passages.The osteosarcoma cell lines showed altered gene expression during in vitro culture, and it was more pronounced in two metastatic cell lines compared to the respective parental cells. Chromosomal instability contributed in part to the altered gene expression in SAOS and LM5 cells with low and high metastatic potential. To identify metastasis-relevant genes in a background of passage-dependent altered gene expression, genes involved in "Pathways in cancer" that were consistently regulated under all passage comparisons were evaluated. Genes belonging to "Hedgehog signaling pathway" and "Wnt signaling pathway" were significantly up-regulated, and IHH, WNT10B and TCF7 were found up-regulated in all three metastatic compared to the parental cell lines.Considerable instability during culture in terms of gene expression and chromosomal aberrations was observed in osteosarcoma cell lines. The use of cells from different passages and a search for genes consistently regulated in early and late passages allows the analysis of metastasis-relevant genes despite the observed instability in gene expression in osteosarcoma cell lines during culture.

  7. Gene trio signatures as molecular markers to predict response to doxorubicin cyclophosphamide neoadjuvant chemotherapy in breast cancerpatients

    Directory of Open Access Journals (Sweden)

    M.C. Barros Filho

    2010-12-01

    Full Text Available In breast cancer patients submitted to neoadjuvant chemotherapy (4 cycles of doxorubicin and cyclophosphamide, AC, expression of groups of three genes (gene trio signatures could distinguish responsive from non-responsive tumors, as demonstrated by cDNA microarray profiling in a previous study by our group. In the current study, we determined if the expression of the same genes would retain the predictive strength, when analyzed by a more accessible technique (real-time RT-PCR. We evaluated 28 samples already analyzed by cDNA microarray, as a technical validation procedure, and 14 tumors, as an independent biological validation set. All patients received neoadjuvant chemotherapy (4 AC. Among five trio combinations previously identified, defined by nine genes individually investigated (BZRP, CLPTM1,MTSS1, NOTCH1, NUP210, PRSS11, RPL37A, SMYD2, and XLHSRF-1, the most accurate were established by RPL37A, XLHSRF-1based trios, with NOTCH1 or NUP210. Both trios correctly separated 86% of tumors (87% sensitivity and 80% specificity for predicting response, according to their response to chemotherapy (82% in a leave-one-out cross-validation method. Using the pre-established features obtained by linear discriminant analysis, 71% samples from the biological validation set were also correctly classified by both trios (72% sensitivity; 66% specificity. Furthermore, we explored other gene combinations to achieve a higher accuracy in the technical validation group (as a training set. A new trio, MTSS1, RPL37 and SMYD2, correctly classified 93% of samples from the technical validation group (95% sensitivity and 80% specificity; 86% accuracy by the cross-validation method and 79% from the biological validation group (72% sensitivity and 100% specificity. Therefore, the combined expression of MTSS1, RPL37 and SMYD2, as evaluated by real-time RT-PCR, is a potential candidate to predict response to neoadjuvant doxorubicin and cyclophosphamide in breast cancer

  8. Advanced colorectal adenoma related gene expression signature may predict prognostic for colorectal cancer patients with adenoma-carcinoma sequence.

    Science.gov (United States)

    Li, Bing; Shi, Xiao-Yu; Liao, Dai-Xiang; Cao, Bang-Rong; Luo, Cheng-Hua; Cheng, Shu-Jun

    2015-01-01

    There are still no absolute parameters predicting progression of adenoma into cancer. The present study aimed to characterize functional differences on the multistep carcinogenetic process from the adenoma-carcinoma sequence. All samples were collected and mRNA expression profiling was performed by using Agilent Microarray high-throughput gene-chip technology. Then, the characteristics of mRNA expression profiles of adenoma-carcinoma sequence were described with bioinformatics software, and we analyzed the relationship between gene expression profiles of adenoma-adenocarcinoma sequence and clinical prognosis of colorectal cancer. The mRNA expressions of adenoma-carcinoma sequence were significantly different between high-grade intraepithelial neoplasia group and adenocarcinoma group. The biological process of gene ontology function enrichment analysis on differentially expressed genes between high-grade intraepithelial neoplasia group and adenocarcinoma group showed that genes enriched in the extracellular structure organization, skeletal system development, biological adhesion and itself regulated growth regulation, with the P value after FDR correction of less than 0.05. In addition, IPR-related protein mainly focused on the insulin-like growth factor binding proteins. The variable trends of gene expression profiles for adenoma-carcinoma sequence were mainly concentrated in high-grade intraepithelial neoplasia and adenocarcinoma. The differentially expressed genes are significantly correlated between high-grade intraepithelial neoplasia group and adenocarcinoma group. Bioinformatics analysis is an effective way to study the gene expression profiles in the adenoma-carcinoma sequence, and may provide an effective tool to involve colorectal cancer research strategy into colorectal adenoma or advanced adenoma.

  9. Prediction of Metastasis and Recurrence in Colorectal Cancer Based on Gene Expression Analysis: Ready for the Clinic?

    International Nuclear Information System (INIS)

    Shibayama, Masaki; Maak, Matthias; Nitsche, Ulrich; Gotoh, Kengo; Rosenberg, Robert; Janssen, Klaus-Peter

    2011-01-01

    Cancers of the colon and rectum, which rank among the most frequent human tumors, are currently treated by surgical resection in locally restricted tumor stages. However, disease recurrence and formation of local and distant metastasis frequently occur even in cases with successful curative resection of the primary tumor (R0). Recent technological advances in molecular diagnostic analysis have led to a wealth of knowledge about the changes in gene transcription in all stages of colorectal tumors. Differential gene expression, or transcriptome analysis, has been proposed by many groups to predict disease recurrence, clinical outcome, and also response to therapy, in addition to the well-established clinico-pathological factors. However, the clinical usability of gene expression profiling as a reliable and robust prognostic tool that allows evidence-based clinical decisions is currently under debate. In this review, we will discuss the most recent data on the prognostic significance and potential clinical application of genome wide expression analysis in colorectal cancer

  10. Prediction of Metastasis and Recurrence in Colorectal Cancer Based on Gene Expression Analysis: Ready for the Clinic?

    Energy Technology Data Exchange (ETDEWEB)

    Shibayama, Masaki [Sysmex Corporation, Central Research Laboratories, Kobe 651-2271 (Japan); Maak, Matthias; Nitsche, Ulrich [Chirurgische Klinik, Klinikum Rechts der Isar der TUM, München 81657 (Germany); Gotoh, Kengo [Sysmex Corporation, Central Research Laboratories, Kobe 651-2271 (Japan); Rosenberg, Robert; Janssen, Klaus-Peter, E-mail: klaus-peter.janssen@lrz.tum.de [Chirurgische Klinik, Klinikum Rechts der Isar der TUM, München 81657 (Germany)

    2011-07-07

    Cancers of the colon and rectum, which rank among the most frequent human tumors, are currently treated by surgical resection in locally restricted tumor stages. However, disease recurrence and formation of local and distant metastasis frequently occur even in cases with successful curative resection of the primary tumor (R0). Recent technological advances in molecular diagnostic analysis have led to a wealth of knowledge about the changes in gene transcription in all stages of colorectal tumors. Differential gene expression, or transcriptome analysis, has been proposed by many groups to predict disease recurrence, clinical outcome, and also response to therapy, in addition to the well-established clinico-pathological factors. However, the clinical usability of gene expression profiling as a reliable and robust prognostic tool that allows evidence-based clinical decisions is currently under debate. In this review, we will discuss the most recent data on the prognostic significance and potential clinical application of genome wide expression analysis in colorectal cancer.

  11. Advanced colorectal adenoma related gene expression signature may predict prognostic for colorectal cancer patients with adenoma-carcinoma sequence

    OpenAIRE

    Li, Bing; Shi, Xiao-Yu; Liao, Dai-Xiang; Cao, Bang-Rong; Luo, Cheng-Hua; Cheng, Shu-Jun

    2015-01-01

    Background: There are still no absolute parameters predicting progression of adenoma into cancer. The present study aimed to characterize functional differences on the multistep carcinogenetic process from the adenoma-carcinoma sequence. Methods: All samples were collected and mRNA expression profiling was performed by using Agilent Microarray high-throughput gene-chip technology. Then, the characteristics of mRNA expression profiles of adenoma-carcinoma sequence were described with bioinform...

  12. Prognostic and predictive value of VHL gene alteration in renal cell carcinoma: a meta-analysis and review.

    Science.gov (United States)

    Kim, Bum Jun; Kim, Jung Han; Kim, Hyeong Su; Zang, Dae Young

    2017-02-21

    The von Hippel-Lindau (VHL) gene is often inactivated in sporadic renal cell carcinoma (RCC) by mutation or promoter hypermethylation. The prognostic or predictive value of VHL gene alteration is not well established. We conducted this meta-analysis to evaluate the association between the VHL alteration and clinical outcomes in patients with RCC. We searched PUBMED, MEDLINE and EMBASE for articles including following terms in their titles, abstracts, or keywords: 'kidney or renal', 'carcinoma or cancer or neoplasm or malignancy', 'von Hippel-Lindau or VHL', 'alteration or mutation or methylation', and 'prognostic or predictive'. There were six studies fulfilling inclusion criteria and a total of 633 patients with clear cell RCC were included in the study: 244 patients who received anti-vascular endothelial growth factor (VEGF) therapy in the predictive value analysis and 419 in the prognostic value analysis. Out of 663 patients, 410 (61.8%) had VHL alteration. The meta-analysis showed no association between the VHL gene alteration and overall response rate (relative risk = 1.47 [95% CI, 0.81-2.67], P = 0.20) or progression free survival (hazard ratio = 1.02 [95% CI, 0.72-1.44], P = 0.91) in patients with RCC who received VEGF-targeted therapy. There was also no correlation between the VHL alteration and overall survival (HR = 0.80 [95% CI, 0.56-1.14], P = 0.21). In conclusion, this meta-analysis indicates that VHL gene alteration has no prognostic or predictive value in patients with clear cell RCC.

  13. Gene Expression Signature TOPFOX Reflecting Chromosomal Instability Refines Prediction of Prognosis in Grade 2 Breast Cancer

    DEFF Research Database (Denmark)

    Szasz, A.; Li, Qiyuan; Sztupinszki, Z.

    2011-01-01

    Purpose: To assess the ability of genes selected from those reflecting chromosomal instability to identify good and poor prognostic subsets of Grade 2 breast carcinomas. Methods: We selected genes for splitting grade 2 tumours into low and high grade type groups by using public databases. Patient...

  14. SVMRFE based approach for prediction of most discriminatory gene target for type II diabetes

    Directory of Open Access Journals (Sweden)

    Atul Kumar

    2017-06-01

    Full Text Available Type II diabetes is a chronic condition that affects the way our body metabolizes sugar. The body's important source of fuel is now becoming a chronic disease all over the world. It is now very necessary to identify the new potential targets for the drugs which not only control the disease but also can treat it. Support vector machines are the classifier which has a potential to make a classification of the discriminatory genes and non-discriminatory genes. SVMRFE a modification of SVM ranks the genes based on their discriminatory power and eliminate the genes which are not involved in causing the disease. A gene regulatory network has been formed with the top ranked coding genes to identify their role in causing diabetes. To further validate the results pathway study was performed to identify the involvement of the coding genes in type II diabetes. The genes obtained from this study showed a significant involvement in causing the disease, which may be used as a potential drug target.

  15. Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review

    Science.gov (United States)

    Zhang, Xue; Acencio, Marcio Luis; Lemke, Ney

    2016-01-01

    Essential proteins/genes are indispensable to the survival or reproduction of an organism, and the deletion of such essential proteins will result in lethality or infertility. The identification of essential genes is very important not only for understanding the minimal requirements for survival of an organism, but also for finding human disease genes and new drug targets. Experimental methods for identifying essential genes are costly, time-consuming, and laborious. With the accumulation of sequenced genomes data and high-throughput experimental data, many computational methods for identifying essential proteins are proposed, which are useful complements to experimental methods. In this review, we show the state-of-the-art methods for identifying essential genes and proteins based on machine learning and network topological features, point out the progress and limitations of current methods, and discuss the challenges and directions for further research. PMID:27014079

  16. Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

    Science.gov (United States)

    Yu, Jia-Feng; Sui, Tian-Xiang; Wang, Hong-Mei; Wang, Chun-Ling; Jing, Li; Wang, Ji-Hua

    2015-12-01

    Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. Project supported by the National Natural Science Foundation of China (Grant Nos. 61302186 and 61271378) and the Funding from the State Key Laboratory of Bioelectronics of Southeast University.

  17. Predictive Value of Gene Polymorphisms on Recurrence after the Withdrawal of Antithyroid Drugs in Patients with Graves’ Disease

    Directory of Open Access Journals (Sweden)

    Jia Liu

    2017-09-01

    Full Text Available Graves’ disease (GD is one of the most common endocrine diseases. Antithyroid drugs (ATDs treatment is frequently used as the first-choice therapy for GD patients in most countries due to the superiority in safety and tolerance. However, GD patients treated with ATD have a relatively high recurrence rate after drug withdrawal, which is a main limitation for ATD treatment. It is of great importance to identify some predictors of the higher recurrence risk for GD patients, which may facilitate an appropriate therapeutic approach for a given patient at the time of GD diagnosis. The genetic factor was widely believed to be an important pathogenesis for GD. Increasing studies were conducted to investigate the relationship between gene polymorphisms and the recurrence risk in GD patients. In this article, we updated the current literatures to highlight the predictive value of gene polymorphisms on recurrence risk in GD patients after ATD withdrawal. Some gene polymorphisms, such as CTLA4 rs231775, human leukocyte antigen polymorphisms (DRB1*03, DQA1*05, and DQB1*02 might be associated with the high recurrence risk in GD patients. Further prospective studies on patients of different ethnicities, especially studies with large sample sizes, and long-term follow-up, should be conducted to confirm the predictive roles of gene polymorphism.

  18. The accuracy of survival time prediction for patients with glioma is improved by measuring mitotic spindle checkpoint gene expression.

    Directory of Open Access Journals (Sweden)

    Li Bie

    Full Text Available Identification of gene expression changes that improve prediction of survival time across all glioma grades would be clinically useful. Four Affymetrix GeneChip datasets from the literature, containing data from 771 glioma samples representing all WHO grades and eight normal brain samples, were used in an ANOVA model to screen for transcript changes that correlated with grade. Observations were confirmed and extended using qPCR assays on RNA derived from 38 additional glioma samples and eight normal samples for which survival data were available. RNA levels of eight major mitotic spindle assembly checkpoint (SAC genes (BUB1, BUB1B, BUB3, CENPE, MAD1L1, MAD2L1, CDC20, TTK significantly correlated with glioma grade and six also significantly correlated with survival time. In particular, the level of BUB1B expression was highly correlated with survival time (p<0.0001, and significantly outperformed all other measured parameters, including two standards; WHO grade and MIB-1 (Ki-67 labeling index. Measurement of the expression levels of a small set of SAC genes may complement histological grade and other clinical parameters for predicting survival time.

  19. Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

    International Nuclear Information System (INIS)

    Yu Jia-Feng; Sui Tian-Xiang; Wang Ji-Hua; Wang Hong-Mei; Wang Chun-Ling; Jing Li

    2015-01-01

    Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. (special topic)

  20. Increased production of free fatty acids in Aspergillus oryzae by disruption of a predicted acyl-CoA synthetase gene.

    Science.gov (United States)

    Tamano, Koichi; Bruno, Kenneth S; Koike, Hideaki; Ishii, Tomoko; Miura, Ai; Umemura, Myco; Culley, David E; Baker, Scott E; Machida, Masayuki

    2015-04-01

    Fatty acids are attractive molecules as source materials for the production of biodiesel fuel. Previously, we attained a 2.4-fold increase in fatty acid production by increasing the expression of fatty acid synthesis-related genes in Aspergillus oryzae. In this study, we achieved an additional increase in the production of fatty acids by disrupting a predicted acyl-CoA synthetase gene in A. oryzae. The A. oryzae genome is predicted to encode six acyl-CoA synthetase genes and disruption of AO090011000642, one of the six genes, resulted in a 9.2-fold higher accumulation (corresponding to an increased production of 0.23 mmol/g dry cell weight) of intracellular fatty acid in comparison to the wild-type strain. Furthermore, by introducing a niaD marker from Aspergillus nidulans to the disruptant, as well as changing the concentration of nitrogen in the culture medium from 10 to 350 mM, fatty acid productivity reached 0.54 mmol/g dry cell weight. Analysis of the relative composition of the major intracellular free fatty acids caused by disruption of AO090011000642 in comparison to the wild-type strain showed an increase in stearic acid (7 to 26 %), decrease in linoleic acid (50 to 27 %), and no significant changes in palmitic or oleic acid (each around 20-25 %).

  1. Histone modification profiles are predictive for tissue/cell-type specific expression of both protein-coding and microRNA genes

    Directory of Open Access Journals (Sweden)

    Zhang Michael Q

    2011-05-01

    Full Text Available Abstract Background Gene expression is regulated at both the DNA sequence level and through modification of chromatin. However, the effect of chromatin on tissue/cell-type specific gene regulation (TCSR is largely unknown. In this paper, we present a method to elucidate the relationship between histone modification/variation (HMV and TCSR. Results A classifier for differentiating CD4+ T cell-specific genes from housekeeping genes using HMV data was built. We found HMV in both promoter and gene body regions to be predictive of genes which are targets of TCSR. For example, the histone modification types H3K4me3 and H3K27ac were identified as the most predictive for CpG-related promoters, whereas H3K4me3 and H3K79me3 were the most predictive for nonCpG-related promoters. However, genes targeted by TCSR can be predicted using other type of HMVs as well. Such redundancy implies that multiple type of underlying regulatory elements, such as enhancers or intragenic alternative promoters, which can regulate gene expression in a tissue/cell-type specific fashion, may be marked by the HMVs. Finally, we show that the predictive power of HMV for TCSR is not limited to protein-coding genes in CD4+ T cells, as we successfully predicted TCSR targeted genes in muscle cells, as well as microRNA genes with expression specific to CD4+ T cells, by the same classifier which was trained on HMV data of protein-coding genes in CD4+ T cells. Conclusion We have begun to understand the HMV patterns that guide gene expression in both tissue/cell-type specific and ubiquitous manner.

  2. FocusHeuristics - expression-data-driven network optimization and disease gene prediction.

    Science.gov (United States)

    Ernst, Mathias; Du, Yang; Warsow, Gregor; Hamed, Mohamed; Endlich, Nicole; Endlich, Karlhans; Murua Escobar, Hugo; Sklarz, Lisa-Madeleine; Sender, Sina; Junghanß, Christian; Möller, Steffen; Fuellen, Georg; Struckmann, Stephan

    2017-02-16

    To identify genes contributing to disease phenotypes remains a challenge for bioinformatics. Static knowledge on biological networks is often combined with the dynamics observed in gene expression levels over disease development, to find markers for diagnostics and therapy, and also putative disease-modulatory drug targets and drugs. The basis of current methods ranges from a focus on expression-levels (Limma) to concentrating on network characteristics (PageRank, HITS/Authority Score), and both (DeMAND, Local Radiality). We present an integrative approach (the FocusHeuristics) that is thoroughly evaluated based on public expression data and molecular disease characteristics provided by DisGeNet. The FocusHeuristics combines three scores, i.e. the log fold change and another two, based on the sum and difference of log fold changes of genes/proteins linked in a network. A gene is kept when one of the scores to which it contributes is above a threshold. Our FocusHeuristics is both, a predictor for gene-disease-association and a bioinformatics method to reduce biological networks to their disease-relevant parts, by highlighting the dynamics observed in expression data. The FocusHeuristics is slightly, but significantly better than other methods by its more successful identification of disease-associated genes measured by AUC, and it delivers mechanistic explanations for its choice of genes.

  3. A computational method based on the integration of heterogeneous networks for predicting disease-gene associations.

    Directory of Open Access Journals (Sweden)

    Xingli Guo

    Full Text Available The identification of disease-causing genes is a fundamental challenge in human health and of great importance in improving medical care, and provides a better understanding of gene functions. Recent computational approaches based on the interactions among human proteins and disease similarities have shown their power in tackling the issue. In this paper, a novel systematic and global method that integrates two heterogeneous networks for prioritizing candidate disease-causing genes is provided, based on the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein interactions. In this method, the association score function between a query disease and a candidate gene is defined as the weighted sum of all the association scores between similar diseases and neighbouring genes. Moreover, the topological correlation of these two heterogeneous networks can be incorporated into the definition of the score function, and finally an iterative algorithm is designed for this issue. This method was tested with 10-fold cross-validation on all 1,126 diseases that have at least a known causal gene, and it ranked the correct gene as one of the top ten in 622 of all the 1,428 cases, significantly outperforming a state-of-the-art method called PRINCE. The results brought about by this method were applied to study three multi-factorial disorders: breast cancer, Alzheimer disease and diabetes mellitus type 2, and some suggestions of novel causal genes and candidate disease-causing subnetworks were provided for further investigation.

  4. Comparison of loline alkaloid gene clusters across fungal endophytes: predicting the co-regulatory sequence motifs and the evolutionary history.

    Science.gov (United States)

    Kutil, Brandi L; Greenwald, Charles; Liu, Gang; Spiering, Martin J; Schardl, Christopher L; Wilkinson, Heather H

    2007-10-01

    LOL, a fungal secondary metabolite gene cluster found in Epichloë and Neotyphodium species, is responsible for production of insecticidal loline alkaloids. To analyze the genetic architecture and to predict the evolutionary history of LOL, we compared five clusters from four fungal species (single clusters from Epichloë festucae, Neotyphodium sp. PauTG-1, Neotyphodium coenophialum, and two clusters we previously characterized in Neotyphodium uncinatum). Using PhyloCon to compare putative lol gene promoter regions, we have identified four motifs conserved across the lol genes in all five clusters. Each motif has significant similarity to known fungal transcription factor binding sites in the TRANSFAC database. Conservation of these motifs is further support for the hypothesis that the lol genes are co-regulated. Interestingly, the history of asexual Neotyphodium spp. includes multiple interspecific hybridization events. Comparing clusters from three Neotyphodium species and E. festucae allowed us to determine which Epichloë ancestors are the most likely contributors of LOL in these asexual species. For example, while no present day Epichloë typhina isolates are known to produce lolines, our data support the hypothesis that the E. typhina ancestor(s) of three asexual endophyte species contained a LOL gene cluster. Thus, these data support a model of evolution in which the polymorphism in loline alkaloid production phenotypes among endophyte species is likely due to the loss of the trait over time.

  5. Transcription factor binding site enrichment analysis predicts drivers of altered gene expression in nonalcoholic steatohepatitis

    Czech Academy of Sciences Publication Activity Database

    Lake, A.D.; Chaput, A.L.; Novák, Petr; Cherrington, N.J.; Smith, C.L.

    2016-01-01

    Roč. 122, December 15 (2016), s. 62-71 ISSN 0006-2952 Institutional support: RVO:60077344 Keywords : Transcription factor * Liver * Gene expression * Bioinformatics Subject RIV: CE - Biochemistry Impact factor: 4.581, year: 2016

  6. An Individual-Based Diploid Model Predicts Limited Conditions Under Which Stochastic Gene Expression Becomes Advantageous

    KAUST Repository

    Matsumoto, Tomotaka; Mineta, Katsuhiko; Osada, Naoki; Araki, Hitoshi

    2015-01-01

    Recent studies suggest the existence of a stochasticity in gene expression (SGE) in many organisms, and its non-negligible effect on their phenotype and fitness. To date, however, how SGE affects the key parameters of population genetics

  7. PREDICTION OF THE COURSE OF OSTEOARTHROSIS FROM mTOR (MAMMALIAN TARGET OF RAPAMYCIN GENE EXPRESSION

    Directory of Open Access Journals (Sweden)

    E V Chetina

    2012-01-01

    Results. Analysis of gene expression in the outpatients with OA identified two subgroups: in one subgroup (n = 13 mTOR expression was considerably much less than that in the control group; the expression of ATG1 and p21 did not differ greatly from the control and that of caspase 3 and TNF-α was significantly higher. The other outpatients (n = 20 and all the examined patients needing endoprosthetic replacement were ascertained to have a higher gene expression of mTOR, ATG1, p21, caspase 3, and TNF-α than in the control group. Before endoprosthetic replacement, severe joint destruction in patients with OA was associated with enhanced gene expression of mTOR, ATG1, p21, and caspase 3. Conclusion. In early-stage disease, increased mTOR gene expression may serve as a prognostic marker of the severity of the disease and articular cartilage destruction.

  8. TBC2target: A Resource of Predicted Target Genes of Tea Bioactive Compounds

    Directory of Open Access Journals (Sweden)

    Shihua Zhang

    2018-02-01

    Full Text Available Tea is one of the most popular non-alcoholic beverages consumed worldwide. Numerous bioactive constituents of tea were confirmed to possess healthy benefits via the mechanisms of regulating gene expressions or protein activities. However, a complete interacting profile between tea bioactive compounds (TBCs and their target genes is lacking, which put an obstacle in the study of healthy function of tea. To fill this gap, we developed a database of target genes of TBCs (TBC2target, http://camellia.ahau.edu.cn/TBC2target based on a pharmacophore mapping approach. In TBC2target, 6,226 interactions between 240 TBCs and 673 target genes were documented. TBC2target contains detailed information about each interacting entry, such as TBC, CAS number, PubChem CID, source of compound (e.g., green, black, compound type, target gene(s of TBC, gene symbol, gene ID, ENSEMBL ID, PDB ID, TBC bioactivity and the reference. Using the TBC-target associations, we constructed a bipartite network and provided users the global network and local sub-network visualization and topological analyses. The entire database is free for online browsing, searching and downloading. In addition, TBC2target provides a BLAST search function to facilitate use of the database. The particular strengths of TBC2target are the inclusion of the comprehensive TBC-target interactions, and the capacity to visualize and analyze the interacting networks, which may help uncovering the beneficial effects of tea on human health as a central resource in tea health community.

  9. TBC2target: A Resource of Predicted Target Genes of Tea Bioactive Compounds.

    Science.gov (United States)

    Zhang, Shihua; Zhang, Liang; Wang, Yijun; Yang, Jian; Liao, Mingzhi; Bi, Shoudong; Xie, Zhongwen; Ho, Chi-Tang; Wan, Xiaochun

    2018-01-01

    Tea is one of the most popular non-alcoholic beverages consumed worldwide. Numerous bioactive constituents of tea were confirmed to possess healthy benefits via the mechanisms of regulating gene expressions or protein activities. However, a complete interacting profile between tea bioactive compounds (TBCs) and their target genes is lacking, which put an obstacle in the study of healthy function of tea. To fill this gap, we developed a database of target genes of TBCs (TBC2target, http://camellia.ahau.edu.cn/TBC2target) based on a pharmacophore mapping approach. In TBC2target, 6,226 interactions between 240 TBCs and 673 target genes were documented. TBC2target contains detailed information about each interacting entry, such as TBC, CAS number, PubChem CID, source of compound (e.g., green, black), compound type, target gene(s) of TBC, gene symbol, gene ID, ENSEMBL ID, PDB ID, TBC bioactivity and the reference. Using the TBC-target associations, we constructed a bipartite network and provided users the global network and local sub-network visualization and topological analyses. The entire database is free for online browsing, searching and downloading. In addition, TBC2target provides a BLAST search function to facilitate use of the database. The particular strengths of TBC2target are the inclusion of the comprehensive TBC-target interactions, and the capacity to visualize and analyze the interacting networks, which may help uncovering the beneficial effects of tea on human health as a central resource in tea health community.

  10. Predicting multi-level drug response with gene expression profile in multiple myeloma using hierarchical ordinal regression.

    Science.gov (United States)

    Zhang, Xinyan; Li, Bingzong; Han, Huiying; Song, Sha; Xu, Hongxia; Hong, Yating; Yi, Nengjun; Zhuang, Wenzhuo

    2018-05-10

    Multiple myeloma (MM), like other cancers, is caused by the accumulation of genetic abnormalities. Heterogeneity exists in the patients' response to treatments, for example, bortezomib. This urges efforts to identify biomarkers from numerous molecular features and build predictive models for identifying patients that can benefit from a certain treatment scheme. However, previous studies treated the multi-level ordinal drug response as a binary response where only responsive and non-responsive groups are considered. It is desirable to directly analyze the multi-level drug response, rather than combining the response to two groups. In this study, we present a novel method to identify significantly associated biomarkers and then develop ordinal genomic classifier using the hierarchical ordinal logistic model. The proposed hierarchical ordinal logistic model employs the heavy-tailed Cauchy prior on the coefficients and is fitted by an efficient quasi-Newton algorithm. We apply our hierarchical ordinal regression approach to analyze two publicly available datasets for MM with five-level drug response and numerous gene expression measures. Our results show that our method is able to identify genes associated with the multi-level drug response and to generate powerful predictive models for predicting the multi-level response. The proposed method allows us to jointly fit numerous correlated predictors and thus build efficient models for predicting the multi-level drug response. The predictive model for the multi-level drug response can be more informative than the previous approaches. Thus, the proposed approach provides a powerful tool for predicting multi-level drug response and has important impact on cancer studies.

  11. Characterization and detection of a widely distributed gene cluster that predicts anaerobic choline utilization by human gut bacteria.

    Science.gov (United States)

    Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A; Marks, Jonathan A; Haiser, Henry J; Turnbaugh, Peter J; Balskus, Emily P

    2015-04-14

    Elucidation of the molecular mechanisms underlying the human gut microbiota's effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. Anaerobic choline utilization is a bacterial metabolic activity that occurs in the human gut and is linked to multiple diseases. While bacterial genes responsible for

  12. Identification of epigenetically regulated genes that predict patient outcome in neuroblastoma

    International Nuclear Information System (INIS)

    Carén, Helena; Djos, Anna; Nethander, Maria; Sjöberg, Rose-Marie; Kogner, Per; Enström, Camilla; Nilsson, Staffan; Martinsson, Tommy

    2011-01-01

    Epigenetic mechanisms such as DNA methylation and histone modifications are important regulators of gene expression and are frequently involved in silencing tumor suppressor genes. In order to identify genes that are epigenetically regulated in neuroblastoma tumors, we treated four neuroblastoma cell lines with the demethylating agent 5-Aza-2'-deoxycytidine (5-Aza-dC) either separately or in conjunction with the histone deacetylase inhibitor trichostatin A (TSA). Expression was analyzed using whole-genome expression arrays to identify genes activated by the treatment. These data were then combined with data from genome-wide DNA methylation arrays to identify candidate genes silenced in neuroblastoma due to DNA methylation. We present eight genes (KRT19, PRKCDBP, SCNN1A, POU2F2, TGFBI, COL1A2, DHRS3 and DUSP23) that are methylated in neuroblastoma, most of them not previously reported as such, some of which also distinguish between biological subsets of neuroblastoma tumors. Differential methylation was observed for the genes SCNN1A (p < 0.001), PRKCDBP (p < 0.001) and KRT19 (p < 0.01). Among these, the mRNA expression of KRT19 and PRKCDBP was significantly lower in patients that have died from the disease compared with patients with no evidence of disease (fold change -8.3, p = 0.01 for KRT19 and fold change -2.4, p = 0.04 for PRKCDBP). In our study, a low methylation frequency of SCNN1A, PRKCDBP and KRT19 is significantly associated with favorable outcome in neuroblastoma. It is likely that analysis of specific DNA methylation will be one of several methods in future patient therapy stratification protocols for treatment of childhood neuroblastomas

  13. Gene

    Data.gov (United States)

    U.S. Department of Health & Human Services — Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes,...

  14. Exceptions to the rule: case studies in the prediction of pathogenicity for genetic variants in hereditary cancer genes.

    Science.gov (United States)

    Rosenthal, E T; Bowles, K R; Pruss, D; van Kan, A; Vail, P J; McElroy, H; Wenstrup, R J

    2015-12-01

    Based on current consensus guidelines and standard practice, many genetic variants detected in clinical testing are classified as disease causing based on their predicted impact on the normal expression or function of the gene in the absence of additional data. However, our laboratory has identified a subset of such variants in hereditary cancer genes for which compelling contradictory evidence emerged after the initial evaluation following the first observation of the variant. Three representative examples of variants in BRCA1, BRCA2 and MSH2 that are predicted to disrupt splicing, prematurely truncate the protein, or remove the start codon were evaluated for pathogenicity by analyzing clinical data with multiple classification algorithms. Available clinical data for all three variants contradicts the expected pathogenic classification. These variants illustrate potential pitfalls associated with standard approaches to variant classification as well as the challenges associated with monitoring data, updating classifications, and reporting potentially contradictory interpretations to the clinicians responsible for translating test outcomes to appropriate clinical action. It is important to address these challenges now as the model for clinical testing moves toward the use of large multi-gene panels and whole exome/genome analysis, which will dramatically increase the number of genetic variants identified. © 2015 The Authors. Clinical Genetics published by John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  15. Use of Artificial Intelligence and Machine Learning Algorithms with Gene Expression Profiling to Predict Recurrent Nonmuscle Invasive Urothelial Carcinoma of the Bladder.

    Science.gov (United States)

    Bartsch, Georg; Mitra, Anirban P; Mitra, Sheetal A; Almal, Arpit A; Steven, Kenneth E; Skinner, Donald G; Fry, David W; Lenehan, Peter F; Worzel, William P; Cote, Richard J

    2016-02-01

    Due to the high recurrence risk of nonmuscle invasive urothelial carcinoma it is crucial to distinguish patients at high risk from those with indolent disease. In this study we used a machine learning algorithm to identify the genes in patients with nonmuscle invasive urothelial carcinoma at initial presentation that were most predictive of recurrence. We used the genes in a molecular signature to predict recurrence risk within 5 years after transurethral resection of bladder tumor. Whole genome profiling was performed on 112 frozen nonmuscle invasive urothelial carcinoma specimens obtained at first presentation on Human WG-6 BeadChips (Illumina®). A genetic programming algorithm was applied to evolve classifier mathematical models for outcome prediction. Cross-validation based resampling and gene use frequencies were used to identify the most prognostic genes, which were combined into rules used in a voting algorithm to predict the sample target class. Key genes were validated by quantitative polymerase chain reaction. The classifier set included 21 genes that predicted recurrence. Quantitative polymerase chain reaction was done for these genes in a subset of 100 patients. A 5-gene combined rule incorporating a voting algorithm yielded 77% sensitivity and 85% specificity to predict recurrence in the training set, and 69% and 62%, respectively, in the test set. A singular 3-gene rule was constructed that predicted recurrence with 80% sensitivity and 90% specificity in the training set, and 71% and 67%, respectively, in the test set. Using primary nonmuscle invasive urothelial carcinoma from initial occurrences genetic programming identified transcripts in reproducible fashion, which were predictive of recurrence. These findings could potentially impact nonmuscle invasive urothelial carcinoma management. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  16. Gene expression signature in organized and growth arrested mammaryacini predicts good outcome in breast cancer

    Energy Technology Data Exchange (ETDEWEB)

    Fournier, Marcia V.; Martin, Katherine J.; Kenny, Paraic A.; Xhaja, Kris; Bosch, Irene; Yaswen, Paul; Bissell, Mina J.

    2006-02-08

    To understand how non-malignant human mammary epithelial cells (HMEC) transit from a disorganized proliferating to an organized growth arrested state, and to relate this process to the changes that occur in breast cancer, we studied gene expression changes in non-malignant HMEC grown in three-dimensional cultures, and in a previously published panel of microarray data for 295 breast cancer samples. We hypothesized that the gene expression pattern of organized and growth arrested mammary acini would share similarities with breast tumors with good prognoses. Using Affymetrix HG-U133A microarrays, we analyzed the expression of 22,283 gene transcripts in two HMEC cell lines, 184 (finite life span) and HMT3522 S1 (immortal non-malignant), on successive days post-seeding in a laminin-rich extracellular matrix assay. Both HMECs underwent growth arrest in G0/G1 and differentiated into polarized acini between days 5 and 7. We identified gene expression changes with the same temporal pattern in both lines. We show that genes that are significantly lower in the organized, growth arrested HMEC than in their proliferating counterparts can be used to classify breast cancer patients into poor and good prognosis groups with high accuracy. This study represents a novel unsupervised approach to identifying breast cancer markers that may be of use clinically.

  17. Gene Expression Differences Predict Treatment Outcome of Merkel Cell Carcinoma Patients

    Directory of Open Access Journals (Sweden)

    Loren Masterson

    2014-01-01

    Full Text Available Due to the rarity of Merkel cell carcinoma (MCC, prospective clinical trials have not been practical. This study aimed to identify biomarkers with prognostic significance. While sixty-two patients were identified who were treated for MCC at our institution, only seventeen patients had adequate formalin-fixed paraffin-embedded archival tissue and followup to be included in the study. Patients were stratified into good, moderate, or poor prognosis. Laser capture microdissection was used to isolate tumor cells for subsequent RNA isolation and gene expression analysis with Affymetrix GeneChip Human Exon 1.0 ST arrays. Among the 191 genes demonstrating significant differential expression between prognostic groups, keratin 20 and neurofilament protein have previously been identified in studies of MCC and were significantly upregulated in tumors from patients with a poor prognosis. Immunohistochemistry further established that keratin 20 was overexpressed in the poor prognosis tumors. In addition, novel genes of interest such as phospholipase A2 group X, kinesin family member 3A, tumor protein D52, mucin 1, and KIT were upregulated in specimens from patients with poor prognosis. Our pilot study identified several gene expression differences which could be used in the future as prognostic biomarkers in MCC patients.

  18. Prediction of graft-versus-host disease in humans by donor gene-expression profiling.

    Directory of Open Access Journals (Sweden)

    Chantal Baron

    2007-01-01

    Full Text Available BACKGROUND: Graft-versus-host disease (GVHD results from recognition of host antigens by donor T cells following allogeneic hematopoietic cell transplantation (AHCT. Notably, histoincompatibility between donor and recipient is necessary but not sufficient to elicit GVHD. Therefore, we tested the hypothesis that some donors may be "stronger alloresponders" than others, and consequently more likely to elicit GVHD. METHODS AND FINDINGS: To this end, we measured the gene-expression profiles of CD4(+ and CD8(+ T cells from 50 AHCT donors with microarrays. We report that pre-AHCT gene-expression profiling segregates donors whose recipient suffered from GVHD or not. Using quantitative PCR, established statistical tests, and analysis of multiple independent training-test datasets, we found that for chronic GVHD the "dangerous donor" trait (occurrence of GVHD in the recipient is under polygenic control and is shaped by the activity of genes that regulate transforming growth factor-beta signaling and cell proliferation. CONCLUSIONS: These findings strongly suggest that the donor gene-expression profile has a dominant influence on the occurrence of GVHD in the recipient. The ability to discriminate strong and weak alloresponders using gene-expression profiling could pave the way to personalized transplantation medicine.

  19. Identification of Gene Networks for Residual Feed Intake in Angus Cattle Using Genomic Prediction and RNA-seq.

    Science.gov (United States)

    Weber, Kristina L; Welly, Bryan T; Van Eenennaam, Alison L; Young, Amy E; Porto-Neto, Laercio R; Reverter, Antonio; Rincon, Gonzalo

    2016-01-01

    Improvement in feed conversion efficiency can improve the sustainability of beef cattle production, but genomic selection for feed efficiency affects many underlying molecular networks and physiological traits. This study describes the differences between steer progeny of two influential Angus bulls with divergent genomic predictions for residual feed intake (RFI). Eight steer progeny of each sire were phenotyped for growth and feed intake from 8 mo. of age (average BW 254 kg, with a mean difference between sire groups of 4.8 kg) until slaughter at 14-16 mo. of age (average BW 534 kg, sire group difference of 28.8 kg). Terminal samples from pituitary gland, skeletal muscle, liver, adipose, and duodenum were collected from each steer for transcriptome sequencing. Gene expression networks were derived using partial correlation and information theory (PCIT), including differentially expressed (DE) genes, tissue specific (TS) genes, transcription factors (TF), and genes associated with RFI from a genome-wide association study (GWAS). Relative to progeny of the high RFI sire, progeny of the low RFI sire had -0.56 kg/d finishing period RFI (P = 0.05), -1.08 finishing period feed conversion ratio (P = 0.01), +3.3 kg^0.75 finishing period metabolic mid-weight (MMW; P = 0.04), +28.8 kg final body weight (P = 0.01), -12.9 feed bunk visits per day (P = 0.02) with +0.60 min/visit duration (P = 0.01), and +0.0045 carcass specific gravity (weight in air/weight in air-weight in water, a predictor of carcass fat content; P = 0.03). RNA-seq identified 633 DE genes between sire groups among 17,016 expressed genes. PCIT analysis identified >115,000 significant co-expression correlations between genes and 25 TF hubs, i.e. controllers of clusters of DE, TS, and GWAS SNP genes. Pathway analysis suggests low RFI bull progeny possess heightened gut inflammation and reduced fat deposition. This multi-omics analysis shows how differences in RFI genomic breeding values can impact other

  20. Identification of Gene Networks for Residual Feed Intake in Angus Cattle Using Genomic Prediction and RNA-seq.

    Directory of Open Access Journals (Sweden)

    Kristina L Weber

    Full Text Available Improvement in feed conversion efficiency can improve the sustainability of beef cattle production, but genomic selection for feed efficiency affects many underlying molecular networks and physiological traits. This study describes the differences between steer progeny of two influential Angus bulls with divergent genomic predictions for residual feed intake (RFI. Eight steer progeny of each sire were phenotyped for growth and feed intake from 8 mo. of age (average BW 254 kg, with a mean difference between sire groups of 4.8 kg until slaughter at 14-16 mo. of age (average BW 534 kg, sire group difference of 28.8 kg. Terminal samples from pituitary gland, skeletal muscle, liver, adipose, and duodenum were collected from each steer for transcriptome sequencing. Gene expression networks were derived using partial correlation and information theory (PCIT, including differentially expressed (DE genes, tissue specific (TS genes, transcription factors (TF, and genes associated with RFI from a genome-wide association study (GWAS. Relative to progeny of the high RFI sire, progeny of the low RFI sire had -0.56 kg/d finishing period RFI (P = 0.05, -1.08 finishing period feed conversion ratio (P = 0.01, +3.3 kg^0.75 finishing period metabolic mid-weight (MMW; P = 0.04, +28.8 kg final body weight (P = 0.01, -12.9 feed bunk visits per day (P = 0.02 with +0.60 min/visit duration (P = 0.01, and +0.0045 carcass specific gravity (weight in air/weight in air-weight in water, a predictor of carcass fat content; P = 0.03. RNA-seq identified 633 DE genes between sire groups among 17,016 expressed genes. PCIT analysis identified >115,000 significant co-expression correlations between genes and 25 TF hubs, i.e. controllers of clusters of DE, TS, and GWAS SNP genes. Pathway analysis suggests low RFI bull progeny possess heightened gut inflammation and reduced fat deposition. This multi-omics analysis shows how differences in RFI genomic breeding values can impact other

  1. Gene expression profiles in paraffin-embedded core biopsy tissue predict response to chemotherapy in women with locally advanced breast cancer.

    Science.gov (United States)

    Gianni, Luca; Zambetti, Milvia; Clark, Kim; Baker, Joffre; Cronin, Maureen; Wu, Jenny; Mariani, Gabriella; Rodriguez, Jaime; Carcangiu, Marialuisa; Watson, Drew; Valagussa, Pinuccia; Rouzier, Roman; Symmans, W Fraser; Ross, Jeffrey S; Hortobagyi, Gabriel N; Pusztai, Lajos; Shak, Steven

    2005-10-10

    We sought to identify gene expression markers that predict the likelihood of chemotherapy response. We also tested whether chemotherapy response is correlated with the 21-gene Recurrence Score assay that quantifies recurrence risk. Patients with locally advanced breast cancer received neoadjuvant paclitaxel and doxorubicin. RNA was extracted from the pretreatment formalin-fixed paraffin-embedded core biopsies. The expression of 384 genes was quantified using reverse transcriptase polymerase chain reaction and correlated with pathologic complete response (pCR). The performance of genes predicting for pCR was tested in patients from an independent neoadjuvant study where gene expression was obtained using DNA microarrays. Of 89 assessable patients (mean age, 49.9 years; mean tumor size, 6.4 cm), 11 (12%) had a pCR. Eighty-six genes correlated with pCR (unadjusted P < .05); pCR was more likely with higher expression of proliferation-related genes and immune-related genes, and with lower expression of estrogen receptor (ER) -related genes. In 82 independent patients treated with neoadjuvant paclitaxel and doxorubicin, DNA microarray data were available for 79 of the 86 genes. In univariate analysis, 24 genes correlated with pCR with P < .05 (false discovery, four genes) and 32 genes showed correlation with P < .1 (false discovery, eight genes). The Recurrence Score was positively associated with the likelihood of pCR (P = .005), suggesting that the patients who are at greatest recurrence risk are more likely to have chemotherapy benefit. Quantitative expression of ER-related genes, proliferation genes, and immune-related genes are strong predictors of pCR in women with locally advanced breast cancer receiving neoadjuvant anthracyclines and paclitaxel.

  2. Identifying Growth Conditions for Nicotiana benthimiana Resulting in Predictable Gene Expression of Promoter-Gus Fusion

    Science.gov (United States)

    Sandoval, V.; Barton, K.; Longhurst, A.

    2012-12-01

    Revoluta (Rev) is a transcription factor that establishes leaf polarity inArabidopsis thaliana. Through previous work in Dr. Barton's Lab, it is known that Revoluta binds to the ZPR3 promoter, thus activating the ZPR3 gene product inArabidopsis thaliana. Using this knowledge, two separate DNA constructs were made, one carrying revgene and in the other, the ZPR3 promoter fussed with the GUS gene. When inoculated in Nicotiana benthimiana (tobacco), the pMDC32 plasmid produces the Rev protein. Rev binds to the ZPR3 promoter thereby activating the transcription of the GUS gene, which can only be expressed in the presence of Rev. When GUS protein comes in contact with X-Gluc it produce the blue stain seen (See Figure 1). In the past, variability has been seen of GUS expression on tobacco therefore we hypothesized that changing the growing conditions and leaf age might improve how well it's expressed.

  3. Prediction of the Ebola Virus Infection Related Human Genes Using Protein-Protein Interaction Network.

    Science.gov (United States)

    Cao, HuanHuan; Zhang, YuHang; Zhao, Jia; Zhu, Liucun; Wang, Yi; Li, JiaRui; Feng, Yuan-Ming; Zhang, Ning

    2017-01-01

    Ebola hemorrhagic fever (EHF) is caused by Ebola virus (EBOV). It is reported that human could be infected by EBOV with a high fatality rate. However, association factors between EBOV and host still tend to be ambiguous. According to the "guilt by association" (GBA) principle, proteins interacting with each other are very likely to function similarly or the same. Based on this assumption, we tried to obtain EBOV infection-related human genes in a protein-protein interaction network using Dijkstra algorithm. We hope it could contribute to the discovery of novel effective treatments. Finally, 15 genes were selected as potential EBOV infection-related human genes. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  4. Prediction of novel target genes and pathways involved in bevacizumab-resistant colorectal cancer

    Science.gov (United States)

    Makondi, Precious Takondwa; Lee, Chia-Hwa; Huang, Chien-Yu; Chu, Chi-Ming; Chang, Yu-Jia

    2018-01-01

    Bevacizumab combined with cytotoxic chemotherapy is the backbone of metastatic colorectal cancer (mCRC) therapy; however, its treatment efficacy is hampered by therapeutic resistance. Therefore, understanding the mechanisms underlying bevacizumab resistance is crucial to increasing the therapeutic efficacy of bevacizumab. The Gene Expression Omnibus (GEO) database (dataset, GSE86525) was used to identify the key genes and pathways involved in bevacizumab-resistant mCRC. The GEO2R web tool was used to identify differentially expressed genes (DEGs). Functional and pathway enrichment analyses of the DEGs were performed using the Database for Annotation, Visualization, and Integrated Discovery(DAVID). Protein–protein interaction (PPI) networks were established using the Search Tool for the Retrieval of Interacting Genes/Proteins database(STRING) and visualized using Cytoscape software. A total of 124 DEGs were obtained, 57 of which upregulated and 67 were downregulated. PPI network analysis showed that seven upregulated genes and nine downregulated genes exhibited high PPI degrees. In the functional enrichment, the DEGs were mainly enriched in negative regulation of phosphate metabolic process and positive regulation of cell cycle process gene ontologies (GOs); the enriched pathways were the phosphoinositide 3-kinase-serine/threonine kinase signaling pathway, bladder cancer, and microRNAs in cancer. Cyclin-dependent kinase inhibitor 1A(CDKN1A), toll-like receptor 4 (TLR4), CD19 molecule (CD19), breast cancer 1, early onset (BRCA1), platelet-derived growth factor subunit A (PDGFA), and matrix metallopeptidase 1 (MMP1) were the DEGs involved in the pathways and the PPIs. The clinical validation of the DEGs in mCRC (TNM clinical stages 3 and 4) revealed that high PDGFA expression levels were associated with poor overall survival, whereas high BRCA1 and MMP1 expression levels were associated with favorable progress free survival(PFS). The identified genes and pathways

  5. Prediction of novel target genes and pathways involved in bevacizumab-resistant colorectal cancer.

    Directory of Open Access Journals (Sweden)

    Precious Takondwa Makondi

    Full Text Available Bevacizumab combined with cytotoxic chemotherapy is the backbone of metastatic colorectal cancer (mCRC therapy; however, its treatment efficacy is hampered by therapeutic resistance. Therefore, understanding the mechanisms underlying bevacizumab resistance is crucial to increasing the therapeutic efficacy of bevacizumab. The Gene Expression Omnibus (GEO database (dataset, GSE86525 was used to identify the key genes and pathways involved in bevacizumab-resistant mCRC. The GEO2R web tool was used to identify differentially expressed genes (DEGs. Functional and pathway enrichment analyses of the DEGs were performed using the Database for Annotation, Visualization, and Integrated Discovery(DAVID. Protein-protein interaction (PPI networks were established using the Search Tool for the Retrieval of Interacting Genes/Proteins database(STRING and visualized using Cytoscape software. A total of 124 DEGs were obtained, 57 of which upregulated and 67 were downregulated. PPI network analysis showed that seven upregulated genes and nine downregulated genes exhibited high PPI degrees. In the functional enrichment, the DEGs were mainly enriched in negative regulation of phosphate metabolic process and positive regulation of cell cycle process gene ontologies (GOs; the enriched pathways were the phosphoinositide 3-kinase-serine/threonine kinase signaling pathway, bladder cancer, and microRNAs in cancer. Cyclin-dependent kinase inhibitor 1A(CDKN1A, toll-like receptor 4 (TLR4, CD19 molecule (CD19, breast cancer 1, early onset (BRCA1, platelet-derived growth factor subunit A (PDGFA, and matrix metallopeptidase 1 (MMP1 were the DEGs involved in the pathways and the PPIs. The clinical validation of the DEGs in mCRC (TNM clinical stages 3 and 4 revealed that high PDGFA expression levels were associated with poor overall survival, whereas high BRCA1 and MMP1 expression levels were associated with favorable progress free survival(PFS. The identified genes and pathways

  6. Comparative transcriptome analyses of three medicinal Forsythia species and prediction of candidate genes involved in secondary metabolisms.

    Science.gov (United States)

    Sun, Luchao; Rai, Amit; Rai, Megha; Nakamura, Michimi; Kawano, Noriaki; Yoshimatsu, Kayo; Suzuki, Hideyuki; Kawahara, Nobuo; Saito, Kazuki; Yamazaki, Mami

    2018-05-07

    The three Forsythia species, F. suspensa, F. viridissima and F. koreana, have been used as herbal medicines in China, Japan and Korea for centuries and they are known to be rich sources of numerous pharmaceutical metabolites, forsythin, forsythoside A, arctigenin, rutin and other phenolic compounds. In this study, de novo transcriptome sequencing and assembly was performed on these species. Using leaf and flower tissues of F. suspensa, F. viridissima and F. koreana, 1.28-2.45-Gbp sequences of Illumina based pair-end reads were obtained and assembled into 81,913, 88,491 and 69,458 unigenes, respectively. Classification of the annotated unigenes in gene ontology terms and KEGG pathways was used to compare the transcriptome of three Forsythia species. The expression analysis of orthologous genes across all three species showed the expression in leaf tissues being highly correlated. The candidate genes presumably involved in the biosynthetic pathway of lignans and phenylethanoid glycosides were screened as co-expressed genes. They express highly in the leaves of F. viridissima and F. koreana. Furthermore, the three unigenes annotated as acyltransferase were predicted to be associated with the biosynthesis of acteoside and forsythoside A from the expression pattern and phylogenetic analysis. This study is the first report on comparative transcriptome analyses of medicinally important Forsythia genus and will serve as an important resource to facilitate further studies on biosynthesis and regulation of therapeutic compounds in Forsythia species.

  7. Predicting Recurrence and Progression of Noninvasive Papillary Bladder Cancer at Initial Presentation Based on Quantitative Gene Expression Profiles

    DEFF Research Database (Denmark)

    Birkhahn, M.; Mitra, A.P.; Williams, Johan

    2010-01-01

    Background: Currently, tumor grade is the best predictor of outcome at first presentation of noninvasive papillary (Ta) bladder cancer. However, reliable predictors of Ta tumor recurrence and progression for individual patients, which could optimize treatment and follow-up schedules based...... on specific tumor biology, are yet to be identified. Objective: To identify genes predictive for recurrence and progression in Ta bladder cancer at first presentation using a quantitative, pathway-specific approach. Design, setting, and participants: Retrospective study of patients with Ta G2/3 bladder tumors...... at initial presentation with three distinct clinical outcomes: absence of recurrence (n = 16), recurrence without progression (n = 16), and progression to carcinoma in situ or invasive disease (n = 16). Measurements: Expressions of 24 genes that feature in relevant pathways that are deregulated in bladder...

  8. Classification and Diagnostic Output Prediction of Cancer Using Gene Expression Profiling and Supervised Machine Learning Algorithms

    DEFF Research Database (Denmark)

    Yoo, C.; Gernaey, Krist

    2008-01-01

    importance in the projection (VIP) information of the DPLS method. The power of the gene selection method and the proposed supervised hierarchical clustering method is illustrated on a three microarray data sets of leukemia, breast, and colon cancer. Supervised machine learning algorithms thus enable...

  9. Gene expression analysis predicts insect venom anaphylaxis in indolent systemic mastocytosis

    NARCIS (Netherlands)

    Niedoszytko, M.; Bruinenberg, M.; van Doormaal, J. J.; de Monchy, J. G. R.; Nedoszytko, B.; Koppelman, G. H.; Nawijn, M. C.; Wijmenga, C.; Jassem, E.; Oude Elberink, J. N. G.

    P>Background: Anaphylaxis to insect venom (Hymenoptera) is most severe in patients with mastocytosis and may even lead to death. However, not all patients with mastocytosis suffer from anaphylaxis. The aim of the study was to analyze differences in gene expression between patients with indolent

  10. Dopamine Receptor D4 Gene Variation Predicts Preschoolers' Developing Theory of Mind

    Science.gov (United States)

    Lackner, Christine; Sabbagh, Mark A.; Hallinan, Elizabeth; Liu, Xudong; Holden, Jeanette J. A.

    2012-01-01

    Individual differences in preschoolers' understanding that human action is caused by internal mental states, or representational theory of mind (RTM), are heritable, as are developmental disorders such as autism in which RTM is particularly impaired. We investigated whether polymorphisms of genes affecting dopamine (DA) utilization and metabolism…

  11. Conservation of transcription factor binding events predicts gene expression across species

    Science.gov (United States)

    Hemberg, Martin; Kreiman, Gabriel

    2011-01-01

    Recent technological advances have made it possible to determine the genome-wide binding sites of transcription factors (TFs). Comparisons across species have suggested a relatively low degree of evolutionary conservation of experimentally defined TF binding events (TFBEs). Using binding data for six different TFs in hepatocytes and embryonic stem cells from human and mouse, we demonstrate that evolutionary conservation of TFBEs within orthologous proximal promoters is closely linked to function, defined as expression of the target genes. We show that (i) there is a significantly higher degree of conservation of TFBEs when the target gene is expressed in both species; (ii) there is increased conservation of binding events for groups of TFs compared to individual TFs; and (iii) conserved TFBEs have a greater impact on the expression of their target genes than non-conserved ones. These results link conservation of structural elements (TFBEs) to conservation of function (gene expression) and suggest a higher degree of functional conservation than implied by previous studies. PMID:21622661

  12. GENECODIS-Grid: An online grid-based tool to predict functional information in gene lists

    International Nuclear Information System (INIS)

    Nogales, R.; Mejia, E.; Vicente, C.; Montes, E.; Delgado, A.; Perez Griffo, F. J.; Tirado, F.; Pascual-Montano, A.

    2007-01-01

    In this work we introduce GeneCodis-Grid, a grid-based alternative to a bioinformatics tool named Genecodis that integrates different sources of biological information to search for biological features (annotations) that frequently co-occur in a set of genes and rank them by statistical significance. GeneCodis-Grid is a web-based application that takes advantage of two independent grid networks and a computer cluster managed by a meta-scheduler and a web server that host the application. The mining of concurrent biological annotations provides significant information for the functional analysis of gene list obtained by high throughput experiments in biology. Due to the large popularity of this tool, that has registered more than 13000 visits since its publication in January 2007, there is a strong need to facilitate users from different sites to access the system simultaneously. In addition, the complexity of some of the statistical tests used in this approach has made this technique a good candidate for its implementation in a Grid opportunistic environment. (Author)

  13. Genome wide gene expression regulation by HIP1 Protein Interactor, HIPPI: Prediction and validation

    Directory of Open Access Journals (Sweden)

    Lahiri Ansuman

    2011-09-01

    Full Text Available Abstract Background HIP1 Protein Interactor (HIPPI is a pro-apoptotic protein that induces Caspase8 mediated apoptosis in cell. We have shown earlier that HIPPI could interact with a specific 9 bp sequence motif, defined as the HIPPI binding site (HBS, present in the upstream promoter of Caspase1 gene and regulate its expression. We also have shown that HIPPI, without any known nuclear localization signal, could be transported to the nucleus by HIP1, a NLS containing nucleo-cytoplasmic shuttling protein. Thus our present work aims at the investigation of the role of HIPPI as a global transcription regulator. Results We carried out genome wide search for the presence of HBS in the upstream sequences of genes. Our result suggests that HBS was predominantly located within 2 Kb upstream from transcription start site. Transcription factors like CREBP1, TBP, OCT1, EVI1 and P53 half site were significantly enriched in the 100 bp vicinity of HBS indicating that they might co-operate with HIPPI for transcription regulation. To illustrate the role of HIPPI on transcriptome, we performed gene expression profiling by microarray. Exogenous expression of HIPPI in HeLa cells resulted in up-regulation of 580 genes (p HIP1 was knocked down. HIPPI-P53 interaction was necessary for HIPPI mediated up-regulation of Caspase1 gene. Finally, we analyzed published microarray data obtained with post mortem brains of Huntington's disease (HD patients to investigate the possible involvement of HIPPI in HD pathogenesis. We observed that along with the transcription factors like CREB, P300, SREBP1, Sp1 etc. which are already known to be involved in HD, HIPPI binding site was also significantly over-represented in the upstream sequences of genes altered in HD. Conclusions Taken together, the results suggest that HIPPI could act as an important transcription regulator in cell regulating a vast array of genes, particularly transcription factors and at least, in part, play a

  14. An Individual-Based Diploid Model Predicts Limited Conditions Under Which Stochastic Gene Expression Becomes Advantageous

    KAUST Repository

    Matsumoto, Tomotaka

    2015-11-24

    Recent studies suggest the existence of a stochasticity in gene expression (SGE) in many organisms, and its non-negligible effect on their phenotype and fitness. To date, however, how SGE affects the key parameters of population genetics are not well understood. SGE can increase the phenotypic variation and act as a load for individuals, if they are at the adaptive optimum in a stable environment. On the other hand, part of the phenotypic variation caused by SGE might become advantageous if individuals at the adaptive optimum become genetically less-adaptive, for example due to an environmental change. Furthermore, SGE of unimportant genes might have little or no fitness consequences. Thus, SGE can be advantageous, disadvantageous, or selectively neutral depending on its context. In addition, there might be a genetic basis that regulates magnitude of SGE, which is often referred to as “modifier genes,” but little is known about the conditions under which such an SGE-modifier gene evolves. In the present study, we conducted individual-based computer simulations to examine these conditions in a diploid model. In the simulations, we considered a single locus that determines organismal fitness for simplicity, and that SGE on the locus creates fitness variation in a stochastic manner. We also considered another locus that modifies the magnitude of SGE. Our results suggested that SGE was always deleterious in stable environments and increased the fixation probability of deleterious mutations in this model. Even under frequently changing environmental conditions, only very strong natural selection made SGE adaptive. These results suggest that the evolution of SGE-modifier genes requires strict balance among the strength of natural selection, magnitude of SGE, and frequency of environmental changes. However, the degree of dominance affected the condition under which SGE becomes advantageous, indicating a better opportunity for the evolution of SGE in different genetic

  15. Bayesian mixture models for assessment of gene differential behaviour and prediction of pCR through the integration of copy number and gene expression data.

    Directory of Open Access Journals (Sweden)

    Filippo Trentini

    Full Text Available We consider modeling jointly microarray RNA expression and DNA copy number data. We propose Bayesian mixture models that define latent Gaussian probit scores for the DNA and RNA, and integrate between the two platforms via a regression of the RNA probit scores on the DNA probit scores. Such a regression conveniently allows us to include additional sample specific covariates such as biological conditions and clinical outcomes. The two developed methods are aimed respectively to make inference on differential behaviour of genes in patients showing different subtypes of breast cancer and to predict the pathological complete response (pCR of patients borrowing strength across the genomic platforms. Posterior inference is carried out via MCMC simulations. We demonstrate the proposed methodology using a published data set consisting of 121 breast cancer patients.

  16. Urban landscape genetics: canopy cover predicts gene flow between white-footed mouse (Peromyscus leucopus) populations in New York City.

    Science.gov (United States)

    Munshi-South, Jason

    2012-03-01

    In this study, I examine the influence of urban canopy cover on gene flow between 15 white-footed mouse (Peromyscus leucopus) populations in New York City parklands. Parks in the urban core are often highly fragmented, leading to rapid genetic differentiation of relatively nonvagile species. However, a diverse array of 'green' spaces may provide dispersal corridors through 'grey' urban infrastructure. I identify urban landscape features that promote genetic connectivity in an urban environment and compare the success of two different landscape connectivity approaches at explaining gene flow. Gene flow was associated with 'effective distances' between populations that were calculated based on per cent tree canopy cover using two different approaches: (i) isolation by effective distance (IED) that calculates the single best pathway to minimize passage through high-resistance (i.e. low canopy cover) areas, and (ii) isolation by resistance (IBR), an implementation of circuit theory that identifies all low-resistance paths through the landscape. IBR, but not IED, models were significantly associated with three measures of gene flow (Nm from F(ST) , BayesAss+ and Migrate-n) after factoring out the influence of isolation by distance using partial Mantel tests. Predicted corridors for gene flow between city parks were largely narrow, linear parklands or vegetated spaces that are not managed for wildlife, such as cemeteries and roadway medians. These results have implications for understanding the impacts of urbanization trends on native wildlife, as well as for urban reforestation efforts that aim to improve urban ecosystem processes. © 2012 Blackwell Publishing Ltd.

  17. Aberrant gene methylation in non-neoplastic mucosa as a predictive marker of ulcerative colitis-associated CRC.

    Science.gov (United States)

    Scarpa, Marco; Scarpa, Melania; Castagliuolo, Ignazio; Erroi, Francesca; Kotsafti, Andromachi; Basato, Silvia; Brun, Paola; D'Incà, Renata; Rugge, Massimo; Angriman, Imerio; Castoro, Carlo

    2016-03-01

    BACKGROUND PROMOTER: hypermethylation plays a major role in cancer through transcriptional silencing of critical genes. The aim of our study is to evaluate the methylation status of these genes in the colonic mucosa without dysplasia or adenocarcinoma at the different steps of sporadic and UC-related carcinogenesis and to investigate the possible role of genomic methylation as a marker of CRC. The expression of Dnmts 1 and 3A was significantly increased in UC-related carcinogenesis compared to non inflammatory colorectal carcinogenesis. In non-neoplastic colonic mucosa, the number of methylated genes resulted significantly higher in patients with CRC and in those with UC-related CRC compared to the HC and UC patients and patients with dysplastic lesion of the colon. The number of methylated genes in non-neoplastic colonic mucosa predicted the presence of CRC with good accuracy either in non inflammatory and inflammatory related CRC. Colonic mucosal samples were collected from healthy subjects (HC) (n = 30) and from patients with ulcerative colitis (UC) (n = 29), UC and dysplasia (n = 14), UC and cancer (n = 10), dysplastic adenoma (n = 14), and colon adenocarcinoma (n = 10). DNA methyltransferases-1, -3a, -3b, mRNA expression were quantified by real time qRT-PCR. The methylation status of CDH13, APC, MLH1, MGMT1 and RUNX3 gene promoters was assessed by methylation-specific PCR. Methylation status of APC, CDH13, MGMT, MLH1 and RUNX3 in the non-neoplastic mucosa may be used as a marker of CRC: these preliminary results could allow for the adjustment of a patient's surveillance interval and to select UC patients who should undergo intensive surveillance.

  18. Identification and Validation of a New Set of Five Genes for Prediction of Risk in Early Breast Cancer

    Directory of Open Access Journals (Sweden)

    Giorgio Mustacchi

    2013-05-01

    Full Text Available Molecular tests predicting the outcome of breast cancer patients based on gene expression levels can be used to assist in making treatment decisions after consideration of conventional markers. In this study we identified a subset of 20 mRNA differentially regulated in breast cancer analyzing several publicly available array gene expression data using R/Bioconductor package. Using RTqPCR we evaluate 261 consecutive invasive breast cancer cases not selected for age, adjuvant treatment, nodal and estrogen receptor status from paraffin embedded sections. The biological samples dataset was split into a training (137 cases and a validation set (124 cases. The gene signature was developed on the training set and a multivariate stepwise Cox analysis selected five genes independently associated with DFS: FGF18 (HR = 1.13, p = 0.05, BCL2 (HR = 0.57, p = 0.001, PRC1 (HR = 1.51, p = 0.001, MMP9 (HR = 1.11, p = 0.08, SERF1a (HR = 0.83, p = 0.007. These five genes were combined into a linear score (signature weighted according to the coefficients of the Cox model, as: 0.125FGF18 − 0.560BCL2 + 0.409PRC1 + 0.104MMP9 − 0.188SERF1A (HR = 2.7, 95% CI = 1.9–4.0, p < 0.001. The signature was then evaluated on the validation set assessing the discrimination ability by a Kaplan Meier analysis, using the same cut offs classifying patients at low, intermediate or high risk of disease relapse as defined on the training set (p < 0.001. Our signature, after a further clinical validation, could be proposed as prognostic signature for disease free survival in breast cancer patients where the indication for adjuvant chemotherapy added to endocrine treatment is uncertain.

  19. Prediction of the damage-associated non-synonymous single nucleotide polymorphisms in the human MC1R gene.

    Science.gov (United States)

    Hepp, Diego; Gonçalves, Gislene Lopes; de Freitas, Thales Renato Ochotorena

    2015-01-01

    The melanocortin 1 receptor (MC1R) is involved in the control of melanogenesis. Polymorphisms in this gene have been associated with variation in skin and hair color and with elevated risk for the development of melanoma. Here we used 11 computational tools based on different approaches to predict the damage-associated non-synonymous single nucleotide polymorphisms (nsSNPs) in the coding region of the human MC1R gene. Among the 92 nsSNPs arranged according to the predictions 62% were classified as damaging in more than five tools. The classification was significantly correlated with the scores of two consensus programs. Alleles associated with the red hair color (RHC) phenotype and with the risk of melanoma were examined. The R variants D84E, R142H, R151C, I155T, R160W and D294H were classified as damaging by the majority of the tools while the r variants V60L, V92M and R163Q have been predicted as neutral in most of the programs The combination of the prediction tools results in 14 nsSNPs indicated as the most damaging mutations in MC1R (L48P, R67W, H70Y, P72L, S83P, R151H, S172I, L206P, T242I, G255R, P256S, C273Y, C289R and R306H); C273Y showed to be highly damaging in SIFT, Polyphen-2, MutPred, PANTHER and PROVEAN scores. The computational analysis proved capable of identifying the potentially damaging nsSNPs in MC1R, which are candidates for further laboratory studies of the functional and pharmacological significance of the alterations in the receptor and the phenotypic outcomes.

  20. A biological network-based regularized artificial neural network model for robust phenotype prediction from gene expression data.

    Science.gov (United States)

    Kang, Tianyu; Ding, Wei; Zhang, Luoyan; Ziemek, Daniel; Zarringhalam, Kourosh

    2017-12-19

    Stratification of patient subpopulations that respond favorably to treatment or experience and adverse reaction is an essential step toward development of new personalized therapies and diagnostics. It is currently feasible to generate omic-scale biological measurements for all patients in a study, providing an opportunity for machine learning models to identify molecular markers for disease diagnosis and progression. However, the high variability of genetic background in human populations hampers the reproducibility of omic-scale markers. In this paper, we develop a biological network-based regularized artificial neural network model for prediction of phenotype from transcriptomic measurements in clinical trials. To improve model sparsity and the overall reproducibility of the model, we incorporate regularization for simultaneous shrinkage of gene sets based on active upstream regulatory mechanisms into the model. We benchmark our method against various regression, support vector machines and artificial neural network models and demonstrate the ability of our method in predicting the clinical outcomes using clinical trial data on acute rejection in kidney transplantation and response to Infliximab in ulcerative colitis. We show that integration of prior biological knowledge into the classification as developed in this paper, significantly improves the robustness and generalizability of predictions to independent datasets. We provide a Java code of our algorithm along with a parsed version of the STRING DB database. In summary, we present a method for prediction of clinical phenotypes using baseline genome-wide expression data that makes use of prior biological knowledge on gene-regulatory interactions in order to increase robustness and reproducibility of omic-scale markers. The integrated group-wise regularization methods increases the interpretability of biological signatures and gives stable performance estimates across independent test sets.

  1. Prediction of the damage-associated non-synonymous single nucleotide polymorphisms in the human MC1R gene.

    Directory of Open Access Journals (Sweden)

    Diego Hepp

    Full Text Available The melanocortin 1 receptor (MC1R is involved in the control of melanogenesis. Polymorphisms in this gene have been associated with variation in skin and hair color and with elevated risk for the development of melanoma. Here we used 11 computational tools based on different approaches to predict the damage-associated non-synonymous single nucleotide polymorphisms (nsSNPs in the coding region of the human MC1R gene. Among the 92 nsSNPs arranged according to the predictions 62% were classified as damaging in more than five tools. The classification was significantly correlated with the scores of two consensus programs. Alleles associated with the red hair color (RHC phenotype and with the risk of melanoma were examined. The R variants D84E, R142H, R151C, I155T, R160W and D294H were classified as damaging by the majority of the tools while the r variants V60L, V92M and R163Q have been predicted as neutral in most of the programs The combination of the prediction tools results in 14 nsSNPs indicated as the most damaging mutations in MC1R (L48P, R67W, H70Y, P72L, S83P, R151H, S172I, L206P, T242I, G255R, P256S, C273Y, C289R and R306H; C273Y showed to be highly damaging in SIFT, Polyphen-2, MutPred, PANTHER and PROVEAN scores. The computational analysis proved capable of identifying the potentially damaging nsSNPs in MC1R, which are candidates for further laboratory studies of the functional and pharmacological significance of the alterations in the receptor and the phenotypic outcomes.

  2. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  3. ETS Gene Fusions as Predictive Biomarkers of Resistance to Radiation Therapy for Prostate Cancer

    Science.gov (United States)

    2016-05-01

    phenotype  in   preclinical  models  of  prostate  cancer,  2)  to  explore  the  mechanism  of  interaction  between   ERG  (the  predominant  ETS...established  this  axis  as  a  potential  therapeutic   target.         15. SUBJECT  TERMS Prostate cancer, ETS gene fusions, ERG , radiation resistance, DNA...interaction  between   ERG   (the   predominant   ETS   gene   fusion   product)   and   the   DNA   repair   protein   DNA-­PK,   and   3)   to

  4. Survivin gene levels in the peripheral blood of patients with gastric cancer independently predict survival

    Directory of Open Access Journals (Sweden)

    Scalerta Romano

    2009-12-01

    Full Text Available Abstract Background The detection of circulating tumor cells (CTC is considered a promising tool for improving risk stratification in patients with solid tumors. We investigated on whether the expression of CTC related genes adds any prognostic power to the TNM staging system in patients with gastric carcinoma. Methods Seventy patients with TNM stage I to IV gastric carcinoma were retrospectively enrolled. Peripheral blood samples were tested by means of quantitative real time PCR (qrtPCR for the expression of four CTC related genes: carcinoembryonic antigen (CEA, cytokeratin-19 (CK19, vascular endothelial growth factor (VEGF and Survivin (BIRC5. Results Gene expression of Survivin, CK19, CEA and VEGF was higher than in normal controls in 98.6%, 97.1%, 42.9% and 38.6% of cases, respectively, suggesting a potential diagnostic value of both Survivin and CK19. At multivariable survival analysis, TNM staging and Survivin mRNA levels were retained as independent prognostic factors, demonstrating that Survivin expression in the peripheral blood adds prognostic information to the TNM system. In contrast with previously published data, the transcript abundance of CEA, CK19 and VEGF was not associated with patients' clinical outcome. Conclusions Gene expression levels of Survivin add significant prognostic value to the current TNM staging system. The validation of these findings in larger prospective and multicentric series might lead to the implementation of this biomarker in the routine clinical setting in order to optimize risk stratification and ultimately personalize the therapeutic management of these patients.

  5. Cancer-Predicting Gene Expression Changes in Colonic Mucosa of Western Diet Fed Mlh1 +/- Mice

    Science.gov (United States)

    Dermadi Bebek, Denis; Valo, Satu; Reyhani, Nima; Ollila, Saara; Päivärinta, Essi; Peltomäki, Päivi; Mutanen, Marja; Nyström, Minna

    2013-01-01

    Colorectal cancer (CRC) is the second most common cause of cancer-related deaths in the Western world and interactions between genetic and environmental factors, including diet, are suggested to play a critical role in its etiology. We conducted a long-term feeding experiment in the mouse to address gene expression and methylation changes arising in histologically normal colonic mucosa as putative cancer-predisposing events available for early detection. The expression of 94 growth-regulatory genes previously linked to human CRC was studied at two time points (5 weeks and 12 months of age) in the heterozygote Mlh1 +/- mice, an animal model for human Lynch syndrome (LS), and wild type Mlh1 +/+ littermates, fed by either Western-style (WD) or AIN-93G control diet. In mice fed with WD, proximal colon mucosa, the predominant site of cancer formation in LS, exhibited a significant expression decrease in tumor suppressor genes, Dkk1, Hoxd1, Slc5a8, and Socs1, the latter two only in the Mlh1 +/- mice. Reduced mRNA expression was accompanied by increased promoter methylation of the respective genes. The strongest expression decrease (7.3 fold) together with a significant increase in its promoter methylation was seen in Dkk1, an antagonist of the canonical Wnt signaling pathway. Furthermore, the inactivation of Dkk1 seems to predispose to neoplasias in the proximal colon. This and the fact that Mlh1 which showed only modest methylation was still expressed in both Mlh1 +/- and Mlh1 +/+ mice indicate that the expression decreases and the inactivation of Dkk1 in particular is a prominent early marker for colon oncogenesis. PMID:24204690

  6. Cancer-predicting gene expression changes in colonic mucosa of Western diet fed Mlh1+/- mice.

    Directory of Open Access Journals (Sweden)

    Marjaana Pussila

    Full Text Available Colorectal cancer (CRC is the second most common cause of cancer-related deaths in the Western world and interactions between genetic and environmental factors, including diet, are suggested to play a critical role in its etiology. We conducted a long-term feeding experiment in the mouse to address gene expression and methylation changes arising in histologically normal colonic mucosa as putative cancer-predisposing events available for early detection. The expression of 94 growth-regulatory genes previously linked to human CRC was studied at two time points (5 weeks and 12 months of age in the heterozygote Mlh1(+/- mice, an animal model for human Lynch syndrome (LS, and wild type Mlh1(+/+ littermates, fed by either Western-style (WD or AIN-93G control diet. In mice fed with WD, proximal colon mucosa, the predominant site of cancer formation in LS, exhibited a significant expression decrease in tumor suppressor genes, Dkk1, Hoxd1, Slc5a8, and Socs1, the latter two only in the Mlh1(+/- mice. Reduced mRNA expression was accompanied by increased promoter methylation of the respective genes. The strongest expression decrease (7.3 fold together with a significant increase in its promoter methylation was seen in Dkk1, an antagonist of the canonical Wnt signaling pathway. Furthermore, the inactivation of Dkk1 seems to predispose to neoplasias in the proximal colon. This and the fact that Mlh1 which showed only modest methylation was still expressed in both Mlh1(+/- and Mlh1(+/+ mice indicate that the expression decreases and the inactivation of Dkk1 in particular is a prominent early marker for colon oncogenesis.

  7. Characterizing haploinsufficiency of SHELL gene to improve fruit form prediction in introgressive hybrids of oil palm

    OpenAIRE

    Teh, Chee Keng; Muaz, Siti Dalila; Tangaya, Praveena; Fong, Po-Yee; Ong, Ai Ling; Mayes, Sean; Chew, Fook Tim; Kulaveerasingam, Harikrishna; Appleton, David Ross

    2017-01-01

    The fundamental trait in selective breeding of oil palm (Eleais guineensis Jacq.) is the shell thickness surrounding the kernel. The monogenic shell thickness is inversely correlated to mesocarp thickness, where the crude palm oil accumulates. Commercial thin-shelled tenera derived from thick-shelled dura???shell-less pisifera generally contain 30% higher oil per bunch. Two mutations, sh MPOB (M1) and sh AVROS (M2) in the SHELL gene ? a type II MADS-box transcription factor mainly present in ...

  8. Melanopsin Gene Variations Interact With Season to Predict Sleep Onset and Chronotype

    OpenAIRE

    Roecklein, Kathryn A.; Wong, Patricia M.; Franzen, Peter L.; Hasler, Brant P.; Wood-Vasey, W. Michael; Nimgaonkar, Vishwajit L.; Miller, Megan A.; Kepreos, Kyle M.; Ferrell, Robert E.; Manuck, Stephen B.

    2012-01-01

    The human melanopsin gene has been reported to mediate risk for seasonal affective disorder (SAD), which is hypothesized to be caused by decreased photic input during winter when light levels fall below threshold, resulting in differences in circadian phase and/or sleep. However, it is unclear if melanopsin increases risk of SAD by causing differences in sleep or circadian phase, or if those differences are symptoms of the mood disorder. To determine if melanopsin sequence variations are asso...

  9. A method of predicting changes in human gene splicing induced by genetic variants in context of cis-acting elements

    Directory of Open Access Journals (Sweden)

    Hicks Chindo

    2010-01-01

    Full Text Available Abstract Background Polymorphic variants and mutations disrupting canonical splicing isoforms are among the leading causes of human hereditary disorders. While there is a substantial evidence of aberrant splicing causing Mendelian diseases, the implication of such events in multi-genic disorders is yet to be well understood. We have developed a new tool (SpliceScan II for predicting the effects of genetic variants on splicing and cis-regulatory elements. The novel Bayesian non-canonical 5'GC splice site (SS sensor used in our tool allows inference on non-canonical exons. Results Our tool performed favorably when compared with the existing methods in the context of genes linked to the Autism Spectrum Disorder (ASD. SpliceScan II was able to predict more aberrant splicing isoforms triggered by the mutations, as documented in DBASS5 and DBASS3 aberrant splicing databases, than other existing methods. Detrimental effects behind some of the polymorphic variations previously associated with Alzheimer's and breast cancer could be explained by changes in predicted splicing patterns. Conclusions We have developed SpliceScan II, an effective and sensitive tool for predicting the detrimental effects of genomic variants on splicing leading to Mendelian and complex hereditary disorders. The method could potentially be used to screen resequenced patient DNA to identify de novo mutations and polymorphic variants that could contribute to a genetic disorder.

  10. Association of TLL1 Gene Polymorphism (rs1503298, T > C) with Coronary Heart Disease in PREDICT, UDACS and ED Cohorts

    International Nuclear Information System (INIS)

    Zain, M.; Cooper, J. A.; Li, K. W.; Palmen, J.; Acharya, J.; Howard, P.; Ireland, H.; Humphries, S. E.; Awan, F. R.; Baig, S. M.; Elkeles, R. S.

    2014-01-01

    Objective: To determine the sequence variant of TLL1 gene (rs1503298, T > C) in three British cohorts (PREDICT, UDACS and ED) of patients with type-2 Diabetes mellitus (T2DM) in order to assess its association with coronary heart disease (CHD). Study Design: Analytical study. Place and Duration of Study: UCL, London, UK. Participants were genotyped in 2011-2012 for TLL1 SNP. Samples and related information were previously collected in 2001-2003 for PREDICT, and in 2001-2002 for UDACS and ED groups. Methodology: Patients included in PREDICT (n=600), UDACS (n=1020) and ED (n=1240) had Diabetes. TLL1 SNP (rs1503298, T > C) was genotyped using TaqMan technology. Allele frequencies were compared using c2 test, and tested for Hardy-Weinberg equilibrium. The risk of disease was assessed from Odds ratios (OR) with 95% Confidence Intervals (95% CI). Moreover, for the PREDICT cohort, the SNP association was tested with Coronary Artery Calcification (CAC) scores. Results: No significant association was found for this SNP with CHD or CAC scores in these cohorts. Conclusion: This SNP could not be confirmed as a risk factor for CHD in T2DM patients. However, the low power of the small sample size available is a limitation to the modest effect on risk. Further studies in larger samples would be useful. (author)

  11. Gene expression signature of normal cell-of-origin predicts ovarian tumor outcomes.

    Directory of Open Access Journals (Sweden)

    Melissa A Merritt

    Full Text Available The potential role of the cell-of-origin in determining the tumor phenotype has been raised, but not adequately examined. We hypothesized that distinct cells-of-origin may play a role in determining ovarian tumor phenotype and outcome. Here we describe a new cell culture medium for in vitro culture of paired normal human ovarian (OV and fallopian tube (FT epithelial cells from donors without cancer. While these cells have been cultured individually for short periods of time, to our knowledge this is the first long-term culture of both cell types from the same donors. Through analysis of the gene expression profiles of the cultured OV/FT cells we identified a normal cell-of-origin gene signature that classified primary ovarian cancers into OV-like and FT-like subgroups; this classification correlated with significant differences in clinical outcomes. The identification of a prognostically significant gene expression signature derived solely from normal untransformed cells is consistent with the hypothesis that the normal cell-of-origin may be a source of ovarian tumor heterogeneity and the associated differences in tumor outcome.

  12. Predicting incomplete gene microarray data with the use of supervised learning algorithms

    CSIR Research Space (South Africa)

    Twala, B

    2010-10-01

    Full Text Available that prediction using supervised learning can be improved in probabilistic terms given incomplete microarray data. This imputation approach is based on the a priori probability of each value determined from the instances at that node of a decision tree (PDT...

  13. The importance of virulence prediction and gene networks in microbial risk assessment

    DEFF Research Database (Denmark)

    Wassenaar, Gertrude Maria; Gamieldien, Junaid; Shatkin, JoAnne

    2007-01-01

    For microbial risk assessment, it is necessary to recognize and predict Virulence of bacterial pathogens, including their ability to contaminate foods. Hazard characterization requires data on strain variability regarding virulence and survival during food processing. Moreover, information...... and characterization of microbial hazards, including emerging pathogens, in the context of microbial risk assessment....

  14. Gene expression analysis in predicting the effectiveness of insect venom immunotherapy

    NARCIS (Netherlands)

    Niedoszytko, M.; Bruinenberg, M.; de Monchy, J.; Wijmenga, C.; Platteel, M.; Jassem, E.; Oude Elberink, Joanna N.G.

    Background: Venom immunotherapy (VIT) enables longtime prevention of insect venom allergy in the majority of patients. However, in some, the risk of a resystemic reaction increases after completion of treatment. No reliable factors predicting individual lack of efficacy of VIT are currently

  15. Assessment of the Prognostic and Treatment-Predictive Performance of the Combined HOXB13:IL17BR-MGI Gene Expression Signature in the Trans-ATAC Cohort

    Science.gov (United States)

    2013-12-01

    Shak S, Tang G, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med 2004; 351: 2817–26. 7...Barlow WE, Shak S, et al, for The Breast Cancer Intergroup of North America. Prognostic and predictive value of the 21-gene recurrence score assay in

  16. Role of Endothelial Nitric Oxide Synthase Gene Polymorphisms in Predicting Aneurysmal Subarachnoid Hemorrhage in South Indian Patients

    Directory of Open Access Journals (Sweden)

    Linda Koshy

    2008-01-01

    Full Text Available Endothelial nitric oxide synthase (eNOS gene polymorphisms have been implicated as predisposing genetic factors that can predict aneurysmal subarachnoid hemorrhage (aSAH, but with controversial results from different populations. Using a case-control study design, we tested the hypothesis whether variants in eNOS gene can increase risk of aSAH among South Indian patients, either independently, or by interacting with other risk factors of the disease. We enrolled 122 patients, along with 224 ethnically matched controls. We screened the intron-4 27-bp VNTR, the promoter T-786C and the exon-7 G894T SNPs in the eNOS gene. We found marked interethnic differences in the genotype distribution of eNOS variants when comparing the South Indian population with the reported frequencies from Caucasian and Japanese populations. Genotype distributions in control and patient populations were found to be in Hardy-Weinberg equilibrium. In patients, the allele, genotype and estimated haplotype frequencies did not differ significantly from the controls. Multiple logistic regression indicated hypertension and smoking as risk factors for the disease, however the risk alleles did not have any interaction with these risk factors. Although the eNOS polymorphisms were not found to be a likely risk factor for aSAH, the role of factors such as ethnicity, gender, smoking and hypertension should be evaluated cautiously to understand the genotype to phenotype conversion.

  17. SNPs in genes implicated in radiation response are associated with radiotoxicity and evoke roles as predictive and prognostic biomarkers

    International Nuclear Information System (INIS)

    Alsbeih, Ghazi; El-Sebaie, Medhat; Al-Harbi, Najla; Al-Hadyan, Khaled; Shoukri, Mohamed; Al-Rajhi, Nasser

    2013-01-01

    Biomarkers are needed to individualize cancer radiation treatment. Therefore, we have investigated the association between various risk factors, including single nucleotide polymorphisms (SNPs) in candidate genes and late complications to radiotherapy in our nasopharyngeal cancer patients. A cohort of 155 patients was included. Normal tissue fibrosis was scored using RTOG/EORTC grading system. A total of 45 SNPs in 11 candidate genes (ATM, XRCC1, XRCC3, XRCC4, XRCC5, PRKDC, LIG4, TP53, HDM2, CDKN1A, TGFB1) were genotyped by direct genomic DNA sequencing. Patients with severe fibrosis (cases, G3-4, n = 48) were compared to controls (G0-2, n = 107). Univariate analysis showed significant association (P < 0.05) with radiation complications for 6 SNPs (ATM G/A rs1801516, HDM2 promoter T/G rs2279744 and T/A rs1196333, XRCC1 G/A rs25487, XRCC5 T/C rs1051677 and TGFB1 C/T rs1800469). In addition, Kaplan-Meier analyses have also highlighted significant association between genotypes and length of patients’ follow-up after radiotherapy. Multivariate logistic regression has further sustained these results suggesting predictive and prognostic roles of SNPs. Univariate and multivariate analysis suggest that radiation toxicity in radiotherapy patients are associated with certain SNPs, in genes including HDM2 promoter studied for the 1st time. These results support the use of SNPs as genetic predictive markers for clinical radiosensitivity and evoke a prognostic role for length of patients’ follow-up after radiotherapy

  18. antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification

    DEFF Research Database (Denmark)

    Blin, Kai; Wolf, Thomas; Chevrette, Marc G.

    2017-01-01

    Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding...... the production of such compounds. Since 2011, the 'antibiotics and secondary metabolite analysis shell-antiSMASH' has assisted researchers in efficiently performing this, both as a web server and a standalone tool. Here, we present the thoroughly updated antiSMASH version 4, which adds several novel features...

  19. HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.

    Science.gov (United States)

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2014-01-01

    Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.

  20. Fine and Predictable Tuning of TALEN Gene Editing Targeting for Improved T Cell Adoptive Immunotherapy.

    Science.gov (United States)

    Gautron, Anne-Sophie; Juillerat, Alexandre; Guyot, Valérie; Filhol, Jean-Marie; Dessez, Emilie; Duclert, Aymeric; Duchateau, Philippe; Poirot, Laurent

    2017-12-15

    Using a TALEN-mediated gene-editing approach, we have previously described a process for the large-scale manufacturing of "off-the-shelf" CAR T cells from third-party donor T cells by disrupting the gene encoding TCRα constant chain (TRAC). Taking advantage of a previously described strategy to control TALEN targeting based on the exclusion capacities of non-conventional RVDs, we have developed highly efficient and specific nucleases targeting a key T cell immune checkpoint, PD-1, to improve engineered CAR T cells' functionalities. Here, we demonstrate that this approach allows combined TRAC and PDCD1 TALEN processing at the desired locus while eliminating low-frequency off-site processing. Thus, by replacing few RVDs, we provide here an easy and rapid redesign of optimal TALEN combinations. We anticipate that this method can greatly benefit multiplex editing, which is of key importance especially for therapeutic applications where high editing efficiencies need to be associated with maximal specificity and safety. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  1. Prediction of Toxin Genes from Chinese Yellow Catfish Based on Transcriptomic and Proteomic Sequencing

    Directory of Open Access Journals (Sweden)

    Bing Xie

    2016-04-01

    Full Text Available Fish venom remains a virtually untapped resource. There are so few fish toxin sequences for reference, which increases the difficulty to study toxins from venomous fish and to develop efficient and fast methods to dig out toxin genes or proteins. Here, we utilized Chinese yellow catfish (Pelteobagrus fulvidraco as our research object, since it is a representative species in Siluriformes with its venom glands embedded in the pectoral and dorsal fins. In this study, we set up an in-house toxin database and a novel toxin-discovering protocol to dig out precise toxin genes by combination of transcriptomic and proteomic sequencing. Finally, we obtained 15 putative toxin proteins distributed in five groups, namely Veficolin, Ink toxin, Adamalysin, Za2G and CRISP toxin. It seems that we have developed a novel bioinformatics method, through which we could identify toxin proteins with high confidence. Meanwhile, these toxins can also be useful for comparative studies in other fish and development of potential drugs.

  2. Relative codon adaptation: a generic codon bias index for prediction of gene expression.

    Science.gov (United States)

    Fox, Jesse M; Erill, Ivan

    2010-06-01

    The development of codon bias indices (CBIs) remains an active field of research due to their myriad applications in computational biology. Recently, the relative codon usage bias (RCBS) was introduced as a novel CBI able to estimate codon bias without using a reference set. The results of this new index when applied to Escherichia coli and Saccharomyces cerevisiae led the authors of the original publications to conclude that natural selection favours higher expression and enhanced codon usage optimization in short genes. Here, we show that this conclusion was flawed and based on the systematic oversight of an intrinsic bias for short sequences in the RCBS index and of biases in the small data sets used for validation in E. coli. Furthermore, we reveal that how the RCBS can be corrected to produce useful results and how its underlying principle, which we here term relative codon adaptation (RCA), can be made into a powerful reference-set-based index that directly takes into account the genomic base composition. Finally, we show that RCA outperforms the codon adaptation index (CAI) as a predictor of gene expression when operating on the CAI reference set and that this improvement is significantly larger when analysing genomes with high mutational bias.

  3. Fine and Predictable Tuning of TALEN Gene Editing Targeting for Improved T Cell Adoptive Immunotherapy

    Directory of Open Access Journals (Sweden)

    Anne-Sophie Gautron

    2017-12-01

    Full Text Available Using a TALEN-mediated gene-editing approach, we have previously described a process for the large-scale manufacturing of “off-the-shelf” CAR T cells from third-party donor T cells by disrupting the gene encoding TCRα constant chain (TRAC. Taking advantage of a previously described strategy to control TALEN targeting based on the exclusion capacities of non-conventional RVDs, we have developed highly efficient and specific nucleases targeting a key T cell immune checkpoint, PD-1, to improve engineered CAR T cells’ functionalities. Here, we demonstrate that this approach allows combined TRAC and PDCD1 TALEN processing at the desired locus while eliminating low-frequency off-site processing. Thus, by replacing few RVDs, we provide here an easy and rapid redesign of optimal TALEN combinations. We anticipate that this method can greatly benefit multiplex editing, which is of key importance especially for therapeutic applications where high editing efficiencies need to be associated with maximal specificity and safety.

  4. PPARα gene variants as predicted performance-enhancing polymorphisms in professional Italian soccer players

    Directory of Open Access Journals (Sweden)

    Proia P

    2014-12-01

    Full Text Available Patrizia Proia,1 Antonino Bianco,1 Gabriella Schiera,2 Patrizia Saladino,2 Valentina Contrò,1 Giovanni Caramazza,3 Marcello Traina,1 Keith A Grimaldi,4 Antonio Palma,1 Antonio Paoli5 1Sport and Exercise Sciences Research Unit, 2Department of Biological, Chemical and Pharmaceutical Sciences and Technologies, University of Palermo, Palermo, Italy; 3Regional Sports School of CONI Sicilia, Sicily, Italy; 4Biomedical Engineering Laboratory, Institute of Communication and Computer Systems, National Technical University of Athens, Athens, Greece; 5Department of Biomedical Sciences, University of Padova, Padua, Italy Background: The PPARα gene encodes the peroxisome proliferator-activator receptor alpha, a central regulator of expression of other genes involved in fatty acid metabolism. The purpose of this study was to determine the prevalence of G allele of the PPARα intron 7 G/C polymorphism (rs4253778 in professional Italian soccer players. Methods: Sixty professional soccer players and 30 sedentary volunteers were enrolled in the study. Samples of venous blood were obtained at rest, in the morning, by conventional clinical procedures; blood serum was collected and total cholesterol, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, and triglycerides were measured. An aliquot of anticoagulant-treated blood was used to prepare genomic DNA from whole blood. The G/C polymorphic site in PPARα intron 7 was scanned by using the PCR-RFLP (polymerase chain reaction restriction fragment length polymorphism protocol with TaqI enzyme. Results: We found variations in genotype distribution of PPARα polymorphism between professional soccer players and sedentary volunteers. Particularly, G alleles and the GG genotype were significantly more frequent in soccer players compared with healthy controls (64% versus 48%. No significant correlations were found between lipid profile and genotype background. Conclusion: Previous results

  5. A combination of dopamine genes predicts success by professional Wall Street traders.

    Science.gov (United States)

    Sapra, Steve; Beavin, Laura E; Zak, Paul J

    2012-01-01

    What determines success on Wall Street? This study examined if genes affecting dopamine levels of professional traders were associated with their career tenure. Sixty professional Wall Street traders were genotyped and compared to a control group who did not trade stocks. We found that distinct alleles of the dopamine receptor 4 promoter (DRD4P) and catecholamine-O-methyltransferase (COMT) that affect synaptic dopamine were predominant in traders. These alleles are associated with moderate, rather than very high or very low, levels of synaptic dopamine. The activity of these alleles correlated positively with years spent trading stocks on Wall Street. Differences in personality and trading behavior were also correlated with allelic variants. This evidence suggests there may be a genetic basis for the traits that make one a successful trader.

  6. REST mediates androgen receptor actions on gene repression and predicts early recurrence of prostate cancer

    DEFF Research Database (Denmark)

    Svensson, Charlotte; Ceder, Jens; Iglesias Gato, Diego

    2014-01-01

    The androgen receptor (AR) is a key regulator of prostate tumorgenesis through actions that are not fully understood. We identified the repressor element (RE)-1 silencing transcription factor (REST) as a mediator of AR actions on gene repression. Chromatin immunoprecipitation showed that AR binds...... in cell cycle progression, including Aurora Kinase A, that has previously been implicated in the growth of NE-like castration-resistant tumors. The analysis of prostate cancer tissue microarrays revealed that tumors with reduced expression of REST have higher probability of early recurrence, independently...... of their Gleason score. The demonstration that REST modulates AR actions in prostate epithelia and that REST expression is negatively correlated with disease recurrence after prostatectomy, invite a deeper characterization of its role in prostate carcinogenesis....

  7. Modalities of gene action predicted by the classical evolutionary biological theory of aging.

    Science.gov (United States)

    Martin, George M

    2007-04-01

    What might now be referred to as the "classical" evolutionary biological theory of why we age has had a number of serious challenges in recent years. While the theory might therefore have to be modified under certain circumstances, in the author's opinion, it still provides the soundest theoretical basis for thinking about how we age. Nine modalities of gene action that have the potential to modulate processes of aging are reviewed, including the two most widely reviewed and accepted concepts ("antagonistic pleiotropy" and "mutation accumulation"). While several of these nine mechanisms can be regarded as derivatives of the antagonistic pleiotropic concept, they frame more specific questions for future research. Such research should pursue what appears to be the dominant factor in the determination of intraspecific variations in longevity-stochastic mechanisms, most likely based upon epigenetics. This contrasts with the dominant factor in the determination of interspecific variations in longevity-the constitutional genome, most likely based upon variations in regulatory loci.

  8. The interaction between aggrecan gene VNTR polymorphism and obesity in predicting incident symptomatic lumbar disc herniation.

    Science.gov (United States)

    Cong, Lin; Zhu, Yue; Pang, Hao; Guanjun, T U

    2014-01-01

    An association between aggrecan gene variable number of tandem repeats polymorphism (VNTR) and symptomatic lumbar disc herniation (LDH) has been reported in Chinese Han of Northern China, and obesity had previously been suspected of causing severe LDH. However, the interaction between aggrecan VNTR and obesity in symptomatic LDH has not been well studied. To examine the interaction between aggrecan VNTR and obesity in the susceptibility of symptomatic LDH, 259 participants participated in this study and donated a blood sample. The disease group comprised 61 patients already diagnosed with symptomatic LDH. The control group consisted of 198 healthy blood donors without symptoms of LDH who were not diagnosed with LDH. The aggrecan gene VNTR region was analyzed using polymerase chain reaction. The data indicated that between the two groups, participants carrying one or two alleles ≤25 repeats who were non-obese people showed a 1.057-fold increase in risk for symptomatic LDH (p = 0.895, changing the number of repeat alleles to 25 repeats who were obese people showed an 1.061-fold higher risk (p = 0.885, adding obesity to the mix alone did not demonstrably increase the risk of LDH), while participants carrying one or two alleles ≤25 repeats who were obese people showed a 4.667-fold increase in risk for symptomatic LDH (p = 0.0003, adding obesity plus changing the repeat allele number significantly increased the risk of LDH by 4.667). Overall, the findings suggest an underlying interaction between aggrecan VNTR and obesity in symptomatic LDH.

  9. A network-based predictive gene-expression signature for adjuvant chemotherapy benefit in stage II colorectal cancer.

    Science.gov (United States)

    Cao, Bangrong; Luo, Liping; Feng, Lin; Ma, Shiqi; Chen, Tingqing; Ren, Yuan; Zha, Xiao; Cheng, Shujun; Zhang, Kaitai; Chen, Changmin

    2017-12-13

    The clinical benefit of adjuvant chemotherapy for stage II colorectal cancer (CRC) is controversial. This study aimed to explore novel gene signature to predict outcome benefit of postoperative 5-Fu-based therapy in stage II CRC. Gene-expression profiles of stage II CRCs from two datasets with 5-Fu-based adjuvant chemotherapy (training dataset, n = 212; validation dataset, n = 85) were analyzed to identify the indicator. A systemic approach by integrating gene-expression and protein-protein interaction (PPI) network was implemented to develop the predictive signature. Kaplan-Meier curves and Cox proportional hazards model were used to determine the survival benefit of adjuvant chemotherapy. Experiments with shRNA knock-down were carried out to confirm the signature identified in this study. In the training dataset, we identified 44 PPI sub-modules, by which we separate patients into two clusters (1 and 2) having different chemotherapeutic benefit. A predictor of 11 PPI sub-modules (11-PPI-Mod) was established to discriminate the two sub-groups, with an overall accuracy of 90.1%. This signature was independently validated in an external validation dataset. Kaplan-Meier curves showed an improved outcome for patients who received adjuvant chemotherapy in Cluster 1 sub-group, but even worse survival for those in Cluster 2 sub-group. Similar results were found in both the training and the validation dataset. Multivariate Cox regression revealed an interaction effect between 11-PPI-Mod signature and adjuvant therapy treatment in the training dataset (RFS, p = 0.007; OS, p = 0.006) and the validation dataset (RFS, p = 0.002). From the signature, we found that PTGES gene was up-regulated in CRC cells which were more resistant to 5-Fu. Knock-down of PTGES indicated a growth inhibition and up-regulation of apoptotic markers induced by 5-Fu in CRC cells. Only a small proportion of stage II CRC patients could benefit from adjuvant therapy. The 11-PPI-Mod as

  10. MO-DE-207B-05: Predicting Gene Mutations in Renal Cell Carcinoma Based On CT Imaging Features: Validation Using TCGA-TCIA Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Chen, X; Zhou, Z; Thomas, K; Wang, J [UT Southwestern Medical Center, Dallas, TX (United States)

    2016-06-15

    Purpose: The goal of this work is to investigate the use of contrast enhanced computed tomographic (CT) features for the prediction of mutations of BAP1, PBRM1, and VHL genes in renal cell carcinoma (RCC). Methods: For this study, we used two patient databases with renal cell carcinoma (RCC). The first one consisted of 33 patients from our institution (UT Southwestern Medical Center, UTSW). The second one consisted of 24 patients from the Cancer Imaging Archive (TCIA), where each patient is connected by a unique identi?er to the tissue samples from the Cancer Genome Atlas (TCGA). From the contrast enhanced CT image of each patient, tumor contour was first delineated by a physician. Geometry, intensity, and texture features were extracted from the delineated tumor. Based on UTSW dataset, we completed feature selection and trained a support vector machine (SVM) classifier to predict mutations of BAP1, PBRM1 and VHL genes. We then used TCIA-TCGA dataset to validate the predictive model build upon UTSW dataset. Results: The prediction accuracy of gene expression of TCIA-TCGA patients was 0.83 (20 of 24), 0.83 (20 of 24), and 0.75 (18 of 24) for BAP1, PBRM1, and VHL respectively. For BAP1 gene, texture feature was the most prominent feature type. For PBRM1 gene, intensity feature was the most prominent. For VHL gene, geometry, intensity, and texture features were all important. Conclusion: Using our feature selection strategy and models, we achieved predictive accuracy over 0.75 for all three genes under the condition of using patient data from one institution for training and data from other institutions for testing. These results suggest that radiogenomics can be used to aid in prognosis and used as convenient surrogates for expensive and time consuming gene assay procedures.

  11. Gene Expression Profiles for Predicting Metastasis in Breast Cancer: A Cross-Study Comparison of Classification Methods

    Directory of Open Access Journals (Sweden)

    Mark Burton

    2012-01-01

    Full Text Available Machine learning has increasingly been used with microarray gene expression data and for the development of classifiers using a variety of methods. However, method comparisons in cross-study datasets are very scarce. This study compares the performance of seven classification methods and the effect of voting for predicting metastasis outcome in breast cancer patients, in three situations: within the same dataset or across datasets on similar or dissimilar microarray platforms. Combining classification results from seven classifiers into one voting decision performed significantly better during internal validation as well as external validation in similar microarray platforms than the underlying classification methods. When validating between different microarray platforms, random forest, another voting-based method, proved to be the best performing method. We conclude that voting based classifiers provided an advantage with respect to classifying metastasis outcome in breast cancer patients.

  12. EMX2 gene expression predicts liver metastasis and survival in colorectal cancer.

    Science.gov (United States)

    Aykut, Berk; Ochs, Markus; Radhakrishnan, Praveen; Brill, Adrian; Höcker, Hermine; Schwarz, Sandra; Weissinger, Daniel; Kehm, Roland; Kulu, Yakup; Ulrich, Alexis; Schneider, Martin

    2017-08-22

    The Empty Spiracles Homeobox (EMX-) 2 gene has been associated with regulation of growth and differentiation in neuronal development. While recent studies provide evidence that EMX2 regulates tumorigenesis of various solid tumors, its role in colorectal cancer remains unknown. We aimed to assess the prognostic significance of EMX2 expression in stage III colorectal adenocarcinoma. Expression levels of EMX2 in human colorectal cancer and adjacent mucosa were assessed by qRT-PCR technology, and results were correlated with clinical and survival data. siRNA-mediated knockdown and adenoviral delivery-mediated overexpression of EMX2 were performed in order to investigate its effects on the migration of colorectal cancer cells in vitro. Compared to corresponding healthy mucosa, colorectal tumor samples had decreased EMX2 expression levels. Furthermore, EMX2 down-regulation in colorectal cancer tissue was associated with distant metastasis (M1) and impaired overall patient survival. In vitro knockdown of EMX2 resulted in increased tumor cell migration. Conversely, overexpression of EMX2 led to an inhibition of tumor cell migration. EMX2 is frequently down-regulated in human colorectal cancer, and down-regulation of EMX2 is a prognostic marker for disease-free and overall survival. EMX2 might thus represent a promising therapeutic target in colorectal cancer.

  13. Genetic variants in fanconi anemia pathway genes BRCA2 and FANCA predict melanoma survival.

    Science.gov (United States)

    Yin, Jieyun; Liu, Hongliang; Liu, Zhensheng; Wang, Li-E; Chen, Wei V; Zhu, Dakai; Amos, Christopher I; Fang, Shenying; Lee, Jeffrey E; Wei, Qingyi

    2015-02-01

    Cutaneous melanoma (CM) is the most lethal skin cancer. The Fanconi anemia (FA) pathway involved in DNA crosslink repair may affect CM susceptibility and prognosis. Using data derived from published genome-wide association study, we comprehensively analyzed the associations of 2,339 common single-nucleotide polymorphisms (SNPs) in 14 autosomal FA genes with overall survival (OS) in 858 CM patients. By performing false-positive report probability corrections and stepwise Cox proportional hazards regression analyses, we identified significant associations between CM OS and four putatively functional SNPs: BRCA2 rs10492396 (AG vs. GG: adjusted hazard ratio (adjHR)=1.85, 95% confidence interval (CI)=1.16-2.95, P=0.010), rs206118 (CC vs. TT+TC: adjHR=2.44, 95% CI=1.27-4.67, P=0.007), rs3752447 (CC vs. TT+TC: adjHR=2.10, 95% CI=1.38-3.18, P=0.0005), and FANCA rs62068372 (TT vs. CC+CT: adjHR=1.85, 95% CI=1.27-2.69, P=0.001). Moreover, patients with an increasing number of unfavorable genotypes (NUG) of these loci had markedly reduced OS and melanoma-specific survival (MSS). The final model incorporating with NUG, tumor stage, and Breslow thickness showed an improved discriminatory ability to classify both 5-year OS and 5-year MSS. Additional investigations, preferably prospective studies, are needed to validate our findings.

  14. A Common Variant in the SETD7 Gene Predicts Serum Lycopene Concentrations.

    Science.gov (United States)

    D'Adamo, Christopher R; D'Urso, Antonietta; Ryan, Kathleen A; Yerges-Armstrong, Laura M; Semba, Richard D; Steinle, Nanette I; Mitchell, Braxton D; Shuldiner, Alan R; McArdle, Patrick F

    2016-02-06

    Dietary intake and higher serum concentrations of lycopene have been associated with lower incidence of prostate cancer and other chronic diseases. Identifying determinants of serum lycopene concentrations may thus have important public health implications. Prior studies have suggested that serum lycopene concentrations are under partial genetic control. The goal of this research was to identify genetic predictors of serum lycopene concentrations using the genome-wide association study (GWAS) approach among a sample of 441 Old Order Amish adults that consumed a controlled diet. Linear regression models were utilized to evaluate associations between genetic variants and serum concentrations of lycopene. Variant rs7680948 on chromosome 4, located in the intron region of the SETD7 gene, was significantly associated with serum lycopene concentrations (p = 3.41 × 10(-9)). Our findings also provided nominal support for the association previously noted between SCARB1 and serum lycopene concentrations, although with a different SNP (rs11057841) in the region. This study identified a novel locus associated with serum lycopene concentrations and our results raise a number of intriguing possibilities regarding the nature of the relationship between SETD7 and lycopene, both of which have been independently associated with prostate cancer. Further investigation into this relationship might help provide greater mechanistic understanding of these associations.

  15. A Common Variant in the SETD7 Gene Predicts Serum Lycopene Concentrations

    Directory of Open Access Journals (Sweden)

    Christopher R. D’Adamo

    2016-02-01

    Full Text Available Dietary intake and higher serum concentrations of lycopene have been associated with lower incidence of prostate cancer and other chronic diseases. Identifying determinants of serum lycopene concentrations may thus have important public health implications. Prior studies have suggested that serum lycopene concentrations are under partial genetic control. The goal of this research was to identify genetic predictors of serum lycopene concentrations using the genome-wide association study (GWAS approach among a sample of 441 Old Order Amish adults that consumed a controlled diet. Linear regression models were utilized to evaluate associations between genetic variants and serum concentrations of lycopene. Variant rs7680948 on chromosome 4, located in the intron region of the SETD7 gene, was significantly associated with serum lycopene concentrations (p = 3.41 × 10−9. Our findings also provided nominal support for the association previously noted between SCARB1 and serum lycopene concentrations, although with a different SNP (rs11057841 in the region. This study identified a novel locus associated with serum lycopene concentrations and our results raise a number of intriguing possibilities regarding the nature of the relationship between SETD7 and lycopene, both of which have been independently associated with prostate cancer. Further investigation into this relationship might help provide greater mechanistic understanding of these associations.

  16. Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms.

    Science.gov (United States)

    Falda, Marco; Toppo, Stefano; Pescarolo, Alessandro; Lavezzo, Enrico; Di Camillo, Barbara; Facchinetti, Andrea; Cilia, Elisa; Velasco, Riccardo; Fontana, Paolo

    2012-03-28

    Predicting protein function has become increasingly demanding in the era of next generation sequencing technology. The task to assign a curator-reviewed function to every single sequence is impracticable. Bioinformatics tools, easy to use and able to provide automatic and reliable annotations at a genomic scale, are necessary and urgent. In this scenario, the Gene Ontology has provided the means to standardize the annotation classification with a structured vocabulary which can be easily exploited by computational methods. Argot2 is a web-based function prediction tool able to annotate nucleic or protein sequences from small datasets up to entire genomes. It accepts as input a list of sequences in FASTA format, which are processed using BLAST and HMMER searches vs UniProKB and Pfam databases respectively; these sequences are then annotated with GO terms retrieved from the UniProtKB-GOA database and the terms are weighted using the e-values from BLAST and HMMER. The weighted GO terms are processed according to both their semantic similarity relations described by the Gene Ontology and their associated score. The algorithm is based on the original idea developed in a previous tool called Argot. The entire engine has been completely rewritten to improve both accuracy and computational efficiency, thus allowing for the annotation of complete genomes. The revised algorithm has been already employed and successfully tested during in-house genome projects of grape and apple, and has proven to have a high precision and recall in all our benchmark conditions. It has also been successfully compared with Blast2GO, one of the methods most commonly employed for sequence annotation. The server is freely accessible at http://www.medcomp.medicina.unipd.it/Argot2.

  17. RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study.

    Science.gov (United States)

    Berghoff, Bork A; Karlsson, Torgny; Källman, Thomas; Wagner, E Gerhart H; Grabherr, Manfred G

    2017-01-01

    Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli , and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/. The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data

  18. Signalign: An Ontology of DNA as Signal for Comparative Gene Structure Prediction Using Information-Coding-and-Processing Techniques.

    Science.gov (United States)

    Yu, Ning; Guo, Xuan; Gu, Feng; Pan, Yi

    2016-03-01

    Conventional character-analysis-based techniques in genome analysis manifest three main shortcomings-inefficiency, inflexibility, and incompatibility. In our previous research, a general framework, called DNA As X was proposed for character-analysis-free techniques to overcome these shortcomings, where X is the intermediates, such as digit, code, signal, vector, tree, graph network, and so on. In this paper, we further implement an ontology of DNA As Signal, by designing a tool named Signalign for comparative gene structure analysis, in which DNA sequences are converted into signal series, processed by modified method of dynamic time warping and measured by signal-to-noise ratio (SNR). The ontology of DNA As Signal integrates the principles and concepts of other disciplines including information coding theory and signal processing into sequence analysis and processing. Comparing with conventional character-analysis-based methods, Signalign can not only have the equivalent or superior performance, but also enrich the tools and the knowledge library of computational biology by extending the domain from character/string to diverse areas. The evaluation results validate the success of the character-analysis-free technique for improved performances in comparative gene structure prediction.

  19. Interaction of the ADRB2 gene polymorphism with childhood trauma in predicting adult symptoms of posttraumatic stress disorder.

    Science.gov (United States)

    Liberzon, Israel; King, Anthony P; Ressler, Kerry J; Almli, Lynn M; Zhang, Peng; Ma, Sean T; Cohen, Gregory H; Tamburrino, Marijo B; Calabrese, Joseph R; Galea, Sandro

    2014-10-01

    Posttraumatic stress disorder (PTSD), while highly prevalent (7.6% over a lifetime), develops only in a subset of trauma-exposed individuals. Genetic risk factors in interaction with trauma exposure have been implicated in PTSD vulnerability. To examine the association of 3755 candidate gene single-nucleotide polymorphisms with PTSD development in interaction with a history of childhood trauma. Genetic association study in an Ohio National Guard longitudinal cohort (n = 810) of predominantly male soldiers of European ancestry, with replication in an independent Grady Trauma Project (Atlanta, Georgia) cohort (n = 2083) of predominantly female African American civilians. Continuous measures of PTSD severity, with a modified (interview) PTSD checklist in the discovery cohort and the PTSD Symptom Scale in the replication cohort. Controlling for the level of lifetime adult trauma exposure, we identified the novel association of a single-nucleotide polymorphism within the promoter region of the ADRB2 (Online Mendelian Inheritance in Man 109690) gene with PTSD symptoms in interaction with childhood trauma (rs2400707, P = 1.02 × 10-5, significant after correction for multiple comparisons). The rs2400707 A allele was associated with relative resilience to childhood adversity. An rs2400707 × childhood trauma interaction predicting adult PTSD symptoms was replicated in the independent predominantly female African American cohort. Altered adrenergic and noradrenergic function has been long believed to have a key etiologic role in PTSD development; however, direct evidence of this link has been missing. The rs2400707 polymorphism has been linked to function of the adrenergic system, but, to our knowledge, this is the first study to date linking the ADRB2 gene to PTSD or any psychiatric disorders. These findings have important implications for PTSD etiology, chronic pain, and stress-related comorbidity, as well as for both primary prevention and treatment

  20. Common variants in SOCS7 gene predict obesity, disturbances in lipid metabolism and insulin resistance.

    Science.gov (United States)

    Tellechea, M L; Steinhardt, A Penas; Rodriguez, G; Taverna, M J; Poskus, E; Frechtel, G

    2013-05-01

    Specific Suppressor of Cytokine Signaling (SOCS) members, such as SOCS7, may play a role in the development of insulin resistance (IR) owing to their ability to inhibit insulin signaling pathways. The objective was to explore the association between common variants and related haplotypes in SOCS7 gene and metabolic traits related to obesity, lipid metabolism and IR. 780 unrelated men were included in a cross-sectional study. We selected three tagged SNPs that capture 100% of SNPs with minor allele frequency ≥ 0.10. Analyses were done separately for each SNP and followed up by haplotype analysis. rs8074124C was associated with both obesity (p = 0.005) and abdominal obesity (p = 0.002) and allele C carriers showed, in comparison with TT carriers, lower BMI (p = 0.001) and waist circumference (p = 0.001). rs8074124CC- carriers showed lower fasting insulin (p = 0.017) and HOMA-IR (p = 0.018) than allele T carriers. rs12051836C was associated with hypertriglyceridemia (p = 0.009) and hypertriglyceridemic waist (p = 0.006). rs12051836CC- carriers showed lower fasting insulin (p = 0.043) and HOMA-IR (p = 0.042). Haplotype-based association analysis (rs8074124 and rs12051836 in that order) showed associations with lipid and obesity -related phenotypes, consistent with single locus analysis. Haplotype analysis also revealed association between haplotype CT and both decreased HDL-C (p = 0.026) and HDL-C (p = 0.014) as a continuous variable. We found, for the first time, significant associations between SOCS7 common variants and related haplotypes and obesity, IR and lipid metabolism disorders. Crown Copyright © 2011. Published by Elsevier B.V. All rights reserved.

  1. A serotonin transporter gene polymorphism predicts peripartum depressive symptoms in an at-risk psychiatric cohort.

    Science.gov (United States)

    Binder, Elisabeth B; Newport, D Jeffrey; Zach, Elizabeth B; Smith, Alicia K; Deveau, Todd C; Altshuler, Lori L; Cohen, Lee S; Stowe, Zachary N; Cubells, Joseph F

    2010-07-01

    Peripartum major depressive disorder (MDD) is a prevalent psychiatric disorder with potential detrimental consequences for both mother and child. Despite its enormous health care relevance, data regarding genetic predictors of peripartum depression are sparse. The aim of this study was to investigate associations of the serotonin-transporter linked polymorphic region (5-HTTLPR) genotype with peripartum MDD in an at-risk population. Two hundred and seventy four women with a prior history of MDD were genotyped for 5-HTTLPR and serially evaluated in late pregnancy (gestational weeks 31-40), early post-partum (week 1-8) and late post-partum (week 9-24) for diagnosis of a current major depressive episode (MDE) and depressive symptom severity. 5-HTTLPR S-allele carrier status predicted the occurrence of a MDE in the early post-partum period only (OR=5.13, p=0.017). This association persisted despite continued antidepressant treatment. The 5-HTTLPR genotype may be a clinically relevant predictor of early post-partum depression in an at-risk population. Peripartum major depressive disorder is a prevalent psychiatric disorder with potential detrimental consequences for both mother and child. Despite its enormous health care relevance, data regarding genetic predictors of peripartum depression are sparse. The aim of this study was to investigate associations of the serotonin-transporter linked polymorphic region (5-HTTLPR) genotype with peripartum MDD in an at-risk population. Copyright 2009 Elsevier Ltd. All rights reserved.

  2. When stress predicts a shrinking gene pool, trading early reproduction for longevity can increase fitness, even with lower fecundity.

    Directory of Open Access Journals (Sweden)

    William C Ratcliff

    2009-06-01

    Full Text Available Stresses like dietary restriction or various toxins increase lifespan in taxa as diverse as yeast, Caenorhabditis elegans, Drosophila and rats, by triggering physiological responses that also tend to delay reproduction. Food odors can reverse the effects of dietary restriction, showing that key mechanisms respond to information, not just resources. Such environmental cues can predict population trends, not just individual prospects for survival and reproduction. When population size is increasing, each offspring produced earlier makes a larger proportional contribution to the gene pool, but the reverse is true when population size is declining.We show mathematically that natural selection can favor facultative delay in reproduction when environmental cues predict a decrease in total population size, even if lifetime fecundity decreases with delay. We also show that increased reproduction from waiting for better conditions does not increase fitness (proportional representation when the whole population benefits similarly.We conclude that the beneficial effects of stress on longevity (hormesis in diverse taxa are a side-effect of delaying reproduction in response to environmental cues that population size is likely to decrease. The reversal by food odors of the effects of dietary restriction can be explained as a response to information that population size is less likely to decrease, reducing the chance that delaying reproduction will increase fitness.

  3. A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

    Science.gov (United States)

    Pilkington, Sarah M; Crowhurst, Ross; Hilario, Elena; Nardozza, Simona; Fraser, Lena; Peng, Yongyan; Gunaseelan, Kularajathevan; Simpson, Robert; Tahir, Jibran; Deroles, Simon C; Templeton, Kerry; Luo, Zhiwei; Davy, Marcus; Cheng, Canhong; McNeilage, Mark; Scaglione, Davide; Liu, Yifei; Zhang, Qiong; Datson, Paul; De Silva, Nihal; Gardiner, Susan E; Bassett, Heather; Chagné, David; McCallum, John; Dzierzon, Helge; Deng, Cecilia; Wang, Yen-Yi; Barron, Lorna; Manako, Kelvina; Bowen, Judith; Foster, Toshi M; Erridge, Zoe A; Tiffin, Heather; Waite, Chethi N; Davies, Kevin M; Grierson, Ella P; Laing, William A; Kirk, Rebecca; Chen, Xiuyin; Wood, Marion; Montefiori, Mirco; Brummell, David A; Schwinn, Kathy E; Catanach, Andrew; Fullerton, Christina; Li, Dawei; Meiyalaghan, Sathiyamoorthy; Nieuwenhuizen, Niels; Read, Nicola; Prakash, Roneel; Hunter, Don; Zhang, Huaibi; McKenzie, Marian; Knäbel, Mareike; Harris, Alastair; Allan, Andrew C; Gleave, Andrew; Chen, Angela; Janssen, Bart J; Plunkett, Blue; Ampomah-Dwamena, Charles; Voogd, Charlotte; Leif, Davin; Lafferty, Declan; Souleyre, Edwige J F; Varkonyi-Gasic, Erika; Gambi, Francesco; Hanley, Jenny; Yao, Jia-Long; Cheung, Joey; David, Karine M; Warren, Ben; Marsh, Ken; Snowden, Kimberley C; Lin-Wang, Kui; Brian, Lara; Martinez-Sanchez, Marcela; Wang, Mindy; Ileperuma, Nadeesha; Macnee, Nikolai; Campin, Robert; McAtee, Peter; Drummond, Revel S M; Espley, Richard V; Ireland, Hilary S; Wu, Rongmei; Atkinson, Ross G; Karunairetnam, Sakuntala; Bulley, Sean; Chunkath, Shayhan; Hanley, Zac; Storey, Roy; Thrimawithana, Amali H; Thomson, Susan; David, Charles; Testolin, Raffaele; Huang, Hongwen; Hellens, Roger P; Schaffer, Robert J

    2018-04-16

    Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) 'Hongyang' draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models. A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within 'Hongyang' The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned 'Hort16A' cDNAs and comparing with the predicted protein models for Red5 and both the original 'Hongyang' assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised 'Hongyang' annotation, respectively, compared with 90.9% to the Red5 models. Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and

  4. Progress and challenges in the computational prediction of gene function using networks [v1; ref status: indexed, http://f1000r.es/SqmJUM

    Directory of Open Access Journals (Sweden)

    Paul Pavlidis

    2012-09-01

    Full Text Available In this opinion piece, we attempt to unify recent arguments we have made that serious confounds affect the use of network data to predict and characterize gene function. The development of computational approaches to determine gene function is a major strand of computational genomics research. However, progress beyond using BLAST to transfer annotations has been surprisingly slow. We have previously argued that a large part of the reported success in using "guilt by association" in network data is due to the tendency of methods to simply assign new functions to already well-annotated genes. While such predictions will tend to be correct, they are generic; it is true, but not very helpful, that a gene with many functions is more likely to have any function. We have also presented evidence that much of the remaining performance in cross-validation cannot be usefully generalized to new predictions, making progressive improvement in analysis difficult to engineer. Here we summarize our findings about how these problems will affect network analysis, discuss some ongoing responses within the field to these issues, and consolidate some recommendations and speculation, which we hope will modestly increase the reliability and specificity of gene function prediction.

  5. The effects of lymph node status on predicting outcome in ER+ /HER2- tamoxifen treated breast cancer patients using gene signatures

    International Nuclear Information System (INIS)

    Cockburn, Jessica G.; Hallett, Robin M.; Gillgrass, Amy E.; Dias, Kay N.; Whelan, T.; Levine, M. N.; Hassell, John A.; Bane, Anita

    2016-01-01

    Lymph node (LN) status is the most important prognostic variable used to guide ER positive (+) breast cancer treatment. While a positive nodal status is traditionally associated with a poor prognosis, a subset of these patients respond well to treatment and achieve long-term survival. Several gene signatures have been established as a means of predicting outcome of breast cancer patients, but the development and indication for use of these assays varies. Here we compare the capacity of two approved gene signatures and a third novel signature to predict outcome in distinct LN negative (-) and LN+ populations. We also examine biological differences between tumours associated with LN- and LN+ disease. Gene expression data from publically available data sets was used to compare the ability of Oncotype DX and Prosigna to predict Distant Metastasis Free Survival (DMFS) using an in silico platform. A novel gene signature (Ellen) was developed by including patients with both LN- and LN+ disease and using Prediction Analysis of Microarrays (PAM) software. Gene Set Enrichment Analysis (GSEA) was used to determine biological pathways associated with patient outcome in both LN- and LN+ tumors. The Oncotype DX gene signature, which only used LN- patients during development, significantly predicted outcome in LN- patients, but not LN+ patients. The Prosigna gene signature, which included both LN- and LN+ patients during development, predicted outcome in both LN- and LN+ patient groups. Ellen was also able to predict outcome in both LN- and LN+ patient groups. GSEA suggested that epigenetic modification may be related to poor outcome in LN- disease, whereas immune response may be related to good outcome in LN+ disease. We demonstrate the importance of incorporating lymph node status during the development of prognostic gene signatures. Ellen may be a useful tool to predict outcome of patients regardless of lymph node status, or for those with unknown lymph node status. Finally we

  6. MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping.

    Science.gov (United States)

    Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang

    2018-03-10

    Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.

  7. Salivary testosterone and a trinucleotide (CAG) length polymorphism in the androgen receptor gene predict amygdala reactivity in men.

    Science.gov (United States)

    Manuck, Stephen B; Marsland, Anna L; Flory, Janine D; Gorka, Adam; Ferrell, Robert E; Hariri, Ahmad R

    2010-01-01

    In studies employing functional magnetic resonance imaging (fMRI), reactivity of the amygdala to threat-related sensory cues (viz., facial displays of negative emotion) has been found to correlate positively with interindividual variability in testosterone levels of women and young men and to increase on acute administration of exogenous testosterone. Many of the biological actions of testosterone are mediated by intracellular androgen receptors (ARs), which exert transcriptional control of androgen-dependent genes and are expressed in various regions of the brain, including the amygdala. Transactivation potential of the AR decreases (yielding relative androgen insensitivity) with expansion a polyglutamine stretch in the N-terminal domain of the AR protein, as encoded by a trinucleotide (CAG) repeat polymorphism in exon 1 of the X-chromosome AR gene. Here we examined whether amygdala reactivity to threat-related facial expressions (fear, anger) differs as a function of AR CAG length variation and endogenous (salivary) testosterone in a mid-life sample of 41 healthy men (mean age=45.6 years, range: 34-54 years; CAG repeats, range: 19-29). Testosterone correlated inversely with participant age (r=-0.39, p=0.012) and positively with number of CAG repeats (r=0.45, p=0.003). In partial correlations adjusted for testosterone level, reactivity in the ventral amygdala was lowest among men with largest number of CAG repeats. This inverse association was seen in both the right (r(p)=-0.34, pleft (r(p)=-0.32, pdifferences in salivary testosterone, also in right (r=0.40, pleft (r=0.32, pdifferences in salivary testosterone also predicted dorsal amygdala reactivity and did so independently of CAG repeats, it is suggested that androgenic influences within this anatomically distinct region may be mediated, in part, by non-genomic or AR-independent mechanisms.

  8. Meta-Prediction of MTHFR Gene Polymorphisms and Air Pollution on the Risk of Hypertensive Disorders in Pregnancy Worldwide

    Directory of Open Access Journals (Sweden)

    Ya-Ling Yang

    2018-02-01

    Full Text Available Hypertensive disorders in pregnancy (HDP are devastating health hazards for both women and children. Both methylenetetrahydrofolate reductase (MTHFR gene polymorphisms and air pollution can affect health status and result in increased risk of HDP for women. The major objective of this study was to investigate the effect of MTHFR polymorphisms, air pollution, and their interaction on the risk of HDP by using meta-predictive analytics. We searched various databases comprehensively to access all available studies conducted for various ethnic populations from countries worldwide, from 1997 to 2017. Seventy-one studies with 8064 cases and 13,232 controls for MTHFR C677T and 11 studies with 1425 cases and 1859 controls for MTHFR A1298C were included. MTHFR C677T homozygous TT (risk ratio (RR = 1.28, p < 0.0001 and CT plus TT (RR = 1.07, p = 0.0002 were the risk genotypes, while wild-type CC played a protective role (RR = 0.94, p = 0.0017 for HDP. The meta-predictive analysis found that the percentage of MTHFR C677T TT plus CT (p = 0.044 and CT (p = 0.043 genotypes in the HDP case group were significantly increased with elevated levels of air pollution worldwide. Additionally, in countries with higher air pollution levels, the pregnant women with wild-type CC MTHFR 677 had a protection effect against HDP (p = 0.014, whereas, the homozygous TT of MTHFR C677T polymorphism was a risk genotype for developing HDP. Air pollution level is an environmental factor interacting with increased MTHFR C677T polymorphisms, impacting the susceptibility of HDP for women.

  9. Development and validation of a gene profile predicting benefit of postmastectomy radiotherapy in patients with high-risk breast cancer: a study of gene expression in the DBCG82bc cohort.

    Science.gov (United States)

    Tramm, Trine; Mohammed, Hayat; Myhre, Simen; Kyndi, Marianne; Alsner, Jan; Børresen-Dale, Anne-Lise; Sørlie, Therese; Frigessi, Arnoldo; Overgaard, Jens

    2014-10-15

    To identify genes predicting benefit of radiotherapy in patients with high-risk breast cancer treated with systemic therapy and randomized to receive or not receive postmastectomy radiotherapy (PMRT). The study was based on the Danish Breast Cancer Cooperative Group (DBCG82bc) cohort. Gene-expression analysis was performed in a training set of frozen tumor tissue from 191 patients. Genes were identified through the Lasso method with the endpoint being locoregional recurrence (LRR). A weighted gene-expression index (DBCG-RT profile) was calculated and transferred to quantitative real-time PCR (qRT-PCR) in corresponding formalin-fixed, paraffin-embedded (FFPE) samples, before validation in FFPE from 112 additional patients. Seven genes were identified, and the derived DBCG-RT profile divided the 191 patients into "high LRR risk" and "low LRR risk" groups. PMRT significantly reduced risk of LRR in "high LRR risk" patients, whereas "low LRR risk" patients showed no additional reduction in LRR rate. Technical transfer of the DBCG-RT profile to FFPE/qRT-PCR was successful, and the predictive impact was successfully validated in another 112 patients. A DBCG-RT gene profile was identified and validated, identifying patients with very low risk of LRR and no benefit from PMRT. The profile may provide a method to individualize treatment with PMRT. ©2014 American Association for Cancer Research.

  10. Gene Expression Profiling to Predict Clinical Outcome of Breast Cancer: reproducing, analyzing and extending the Nature publication by vhVeer et al

    NARCIS (Netherlands)

    Li R.; Visser, H.M.

    2010-01-01

    Chemotherapy and hormonal therapy as adjuvant systemic therapies to inhibit breast cancer recurrence are not necessary for each patient. In Veer's paper "Gene expression profiling predicts clinical outcome of breast cancer" (Nature 2002, PMID: 11823860), they introduced a method based on DNA

  11. Glucagon-like peptides GLP-1 and GLP-2, predicted products of the glucagon gene, are secreted separately from pig small intestine but not pancreas

    DEFF Research Database (Denmark)

    Holst, J J; Poulsen, Steen Seier

    1986-01-01

    We developed specific antibodies and RIAs for glucagon-like peptides 1 and 2 (GLP-1 and GLP-2), two predicted products of the glucagon gene, and studied the occurrence, nature, and secretion of immunoreactive GLP-1 and GLP-2 in pig pancreas and small intestine. Immunoreactive GLP-1 and GLP-2 were...

  12. Predictive value of EGFR overexpression and gene amplification on icotinib efficacy in patients with advanced esophageal squamous cell carcinoma.

    Science.gov (United States)

    Wang, Xi; Niu, Haitao; Fan, Qingxia; Lu, Ping; Ma, Changwu; Liu, Wei; Liu, Ying; Li, Weiwei; Hu, Shaoxuan; Ling, Yun; Guo, Lei; Ying, Jianming; Huang, Jing

    2016-04-26

    This study aimed to search for a molecular marker for targeted epithelial growth factor receptor (EGFR) inhibitor Icotinib by analyzing protein expression and amplification of EGFR proto-oncogene in esophageal squamous cell carcinoma (ESCC) patients.Immunohistochemistry and fluorescence in situ hybridization (FISH) was used to assess EGFR expression and gene amplification status in 193 patients with ESCC. We also examined the association between EGFR overexpression and the efficacy of a novel EGFR TKI, icotinib, in 62 ESCC patients.Of the 193 patients, 95 (49.2%) patients showed EGFR overexpression (3+), and 47(24.4%) patients harbored EGFR FISH positivity. EGFR overexpression was significantly correlated with clinical stage and lymph node metastasis (picotinib, the response rate was 17.6% for patients with high EGFR-expressing tumors, which was markedly higher than the rate (0%) for patients with low to moderate EGFR-expressing tumors (p=0.341). Furthermore, all cases responded to icotinib showed EGFR overexpression.In conclusion, our study suggests that EGFR overexpression might potentially be used in predicting the efficacy in patients treated with Icotinib. These data have implications for both clinical trial design and therapeutic strategies.

  13. Meta-Prediction of MTHFR Gene Polymorphisms and Air Pollution on the Risk of Hypertensive Disorders in Pregnancy Worldwide.

    Science.gov (United States)

    Yang, Ya-Ling; Yang, Hsiao-Ling; Shiao, S Pamela K

    2018-02-13

    Hypertensive disorders in pregnancy (HDP) are devastating health hazards for both women and children. Both methylenetetrahydrofolate reductase ( MTHFR ) gene polymorphisms and air pollution can affect health status and result in increased risk of HDP for women. The major objective of this study was to investigate the effect of MTHFR polymorphisms, air pollution, and their interaction on the risk of HDP by using meta-predictive analytics. We searched various databases comprehensively to access all available studies conducted for various ethnic populations from countries worldwide, from 1997 to 2017. Seventy-one studies with 8064 cases and 13,232 controls for MTHFR C677T and 11 studies with 1425 cases and 1859 controls for MTHFR A1298C were included. MTHFR C677T homozygous TT (risk ratio (RR) = 1.28, p worldwide. Additionally, in countries with higher air pollution levels, the pregnant women with wild-type CC MTHFR 677 had a protection effect against HDP ( p = 0.014), whereas, the homozygous TT of MTHFR C677T polymorphism was a risk genotype for developing HDP. Air pollution level is an environmental factor interacting with increased MTHFR C677T polymorphisms, impacting the susceptibility of HDP for women.

  14. Study on predictive role of AR and EGFR family genes with response to neoadjuvant chemotherapy in locally advanced breast cancer in Indian women.

    Science.gov (United States)

    Singh, L C; Chakraborty, Anurupa; Mishra, Ashwani K; Devi, Thoudam Regina; Sugandhi, Nidhi; Chintamani, Chintamani; Bhatnagar, Dinesh; Kapur, Sujala; Saxena, Sunita

    2012-06-01

    Locally advanced breast cancer (LABC) remains a clinical challenge as the majority of patients with this diagnosis develop distant metastases despite appropriate therapy. We analyzed expression of steroid and growth hormone receptor genes as well as gene associated with metabolism of chemotherapeutic drugs in locally advanced breast cancer before and after neoadjuvant chemotherapy (NACT) to study whether there is a change in gene expression induced by chemotherapy and whether such changes are associated with tumor response or non-response. Fifty patients were included with locally advanced breast cancer treated with cyclophosphamide, adriamycin, 5-fluorouracil (CAF)-based neoadjuvant chemotherapy before surgery. Total RNA was extracted from 50 match samples of pre- and post-NACT tumor tissues. RNA expression levels of epidermal growth factor receptor family genes including EGFR, ERBB2, ERBB3, androgen receptor (AR), and multidrug-resistance gene 1 (MDR1) were determined by quantitative real-time reverse transcriptase-polymerase chain reaction. Responders show significantly high levels of pre-NACT AR gene expression (P = 0.016), which reduces following NACT (P = 0.008), and hence can serve as a useful tool for the prediction of the success of neoadjuvant chemotherapy in individual cancer patients with locally advanced breast carcinoma. Moreover, a significant post-therapeutic increase in the expression levels of EGFR and MDR1 gene in responders (P = 0.026 and P < 0.001) as well as in non-responders (P = 0.055, P = 0.001) suggests that expression of these genes changes during therapy but they do not have any impact on tumor response, whereas a post-therapeutic reduction was observed in AR in responders. This indicates an independent predictive role of AR with response to NACT.

  15. Further increased production of free fatty acids by overexpressing a predicted transketolase gene of the pentose phosphate pathway in Aspergillus oryzae faaA disruptant.

    Science.gov (United States)

    Tamano, Koichi; Miura, Ai

    2016-09-01

    Free fatty acids are useful as source materials for the production of biodiesel fuel and various chemicals such as pharmaceuticals and dietary supplements. Previously, we attained a 9.2-fold increase in free fatty acid productivity by disrupting a predicted acyl-CoA synthetase gene (faaA, AO090011000642) in Aspergillus oryzae. In this study, we achieved further increase in the productivity by overexpressing a predicted transketolase gene of the pentose phosphate pathway in the faaA disruptant. The A. oryzae genome is predicted to have three transketolase genes and overexpression of AO090023000345, one of the three genes, resulted in phenotypic change and further increase (corresponding to an increased production of 0.38 mmol/g dry cell weight) in free fatty acids at 1.4-fold compared to the faaA disruptant. Additionally, the biomass of hyphae increased at 1.2-fold by the overexpression. As a result, free fatty acid production yield per liter of liquid culture increased at 1.7-fold by the overexpression.

  16. A gene expression signature of RAS pathway dependence predicts response to PI3K and RAS pathway inhibitors and expands the population of RAS pathway activated tumors.

    Science.gov (United States)

    Loboda, Andrey; Nebozhyn, Michael; Klinghoffer, Rich; Frazier, Jason; Chastain, Michael; Arthur, William; Roberts, Brian; Zhang, Theresa; Chenard, Melissa; Haines, Brian; Andersen, Jannik; Nagashima, Kumiko; Paweletz, Cloud; Lynch, Bethany; Feldman, Igor; Dai, Hongyue; Huang, Pearl; Watters, James

    2010-06-30

    Hyperactivation of the Ras signaling pathway is a driver of many cancers, and RAS pathway activation can predict response to targeted therapies. Therefore, optimal methods for measuring Ras pathway activation are critical. The main focus of our work was to develop a gene expression signature that is predictive of RAS pathway dependence. We used the coherent expression of RAS pathway-related genes across multiple datasets to derive a RAS pathway gene expression signature and generate RAS pathway activation scores in pre-clinical cancer models and human tumors. We then related this signature to KRAS mutation status and drug response data in pre-clinical and clinical datasets. The RAS signature score is predictive of KRAS mutation status in lung tumors and cell lines with high (> 90%) sensitivity but relatively low (50%) specificity due to samples that have apparent RAS pathway activation in the absence of a KRAS mutation. In lung and breast cancer cell line panels, the RAS pathway signature score correlates with pMEK and pERK expression, and predicts resistance to AKT inhibition and sensitivity to MEK inhibition within both KRAS mutant and KRAS wild-type groups. The RAS pathway signature is upregulated in breast cancer cell lines that have acquired resistance to AKT inhibition, and is downregulated by inhibition of MEK. In lung cancer cell lines knockdown of KRAS using siRNA demonstrates that the RAS pathway signature is a better measure of dependence on RAS compared to KRAS mutation status. In human tumors, the RAS pathway signature is elevated in ER negative breast tumors and lung adenocarcinomas, and predicts resistance to cetuximab in metastatic colorectal cancer. These data demonstrate that the RAS pathway signature is superior to KRAS mutation status for the prediction of dependence on RAS signaling, can predict response to PI3K and RAS pathway inhibitors, and is likely to have the most clinical utility in lung and breast tumors.

  17. A gene expression signature of RAS pathway dependence predicts response to PI3K and RAS pathway inhibitors and expands the population of RAS pathway activated tumors

    Directory of Open Access Journals (Sweden)

    Paweletz Cloud

    2010-06-01

    Full Text Available Abstract Background Hyperactivation of the Ras signaling pathway is a driver of many cancers, and RAS pathway activation can predict response to targeted therapies. Therefore, optimal methods for measuring Ras pathway activation are critical. The main focus of our work was to develop a gene expression signature that is predictive of RAS pathway dependence. Methods We used the coherent expression of RAS pathway-related genes across multiple datasets to derive a RAS pathway gene expression signature and generate RAS pathway activation scores in pre-clinical cancer models and human tumors. We then related this signature to KRAS mutation status and drug response data in pre-clinical and clinical datasets. Results The RAS signature score is predictive of KRAS mutation status in lung tumors and cell lines with high (> 90% sensitivity but relatively low (50% specificity due to samples that have apparent RAS pathway activation in the absence of a KRAS mutation. In lung and breast cancer cell line panels, the RAS pathway signature score correlates with pMEK and pERK expression, and predicts resistance to AKT inhibition and sensitivity to MEK inhibition within both KRAS mutant and KRAS wild-type groups. The RAS pathway signature is upregulated in breast cancer cell lines that have acquired resistance to AKT inhibition, and is downregulated by inhibition of MEK. In lung cancer cell lines knockdown of KRAS using siRNA demonstrates that the RAS pathway signature is a better measure of dependence on RAS compared to KRAS mutation status. In human tumors, the RAS pathway signature is elevated in ER negative breast tumors and lung adenocarcinomas, and predicts resistance to cetuximab in metastatic colorectal cancer. Conclusions These data demonstrate that the RAS pathway signature is superior to KRAS mutation status for the prediction of dependence on RAS signaling, can predict response to PI3K and RAS pathway inhibitors, and is likely to have the most clinical

  18. Dopamine and the Creative Mind: Individual Differences in Creativity Are Predicted by Interactions between Dopamine Genes DAT and COMT.

    Science.gov (United States)

    Zabelina, Darya L; Colzato, Lorenza; Beeman, Mark; Hommel, Bernhard

    2016-01-01

    The dopaminergic (DA) system may be involved in creativity, however results of past studies are mixed. We attempted to clarify this putative relation by considering the mediofrontal and the nigrostriatal DA pathways, uniquely and in combination, and their contribution to two different measures of creativity--an abbreviated version of the Torrance Test of Creative Thinking, assessing divergent thinking, and a real-world creative achievement index. We found that creativity can be predicted from interactions between genetic polymorphisms related to frontal (COMT) and striatal (DAT) DA pathways. Importantly, the Torrance test and the real-world creative achievement index related to different genetic patterns, suggesting that these two measures tap into different aspects of creativity, and depend on distinct, but interacting, DA sub-systems. Specifically, we report that successful performance on the Torrance test is linked with dopaminergic polymorphisms associated with good cognitive flexibility and medium top-down control, or with weak cognitive flexibility and strong top-down control. The latter is particularly true for the originality factor of divergent thinking. High real-world creative achievement, on the other hand, as assessed by the Creative Achievement Questionnaire, is linked with dopaminergic polymorphisms associated with weak cognitive flexibility and weak top-down control. Taken altogether, our findings support the idea that human creativity relies on dopamine, and on the interaction between frontal and striatal dopaminergic pathways in particular. This interaction may help clarify some apparent inconsistencies in the prior literature, especially if the genes and/or creativity measures were analyzed separately.

  19. Promoter hypermethylation of mismatch repair gene hMLH1 predicts the clinical response of malignant astrocytomas to nitrosourea.

    Science.gov (United States)

    Fukushima, Takao; Katayama, Yoichi; Watanabe, Takao; Yoshino, Atsuo; Ogino, Akiyoshi; Ohta, Takashi; Komine, Chiaki

    2005-02-15

    In certain types of human cancers, transcriptional inactivation of hMLH1 by promoter hypermethylation plays a causal role in the loss of mismatch repair functions that modulate cytotoxic pathways in response to DNA-damaging agents. The aim of the present study was to investigate the role of promoter methylation of the hMLH1 gene in malignant astrocytomas. We examined the hMLH1 promoter methylation in a homogeneous cohort of patients with 41 malignant astrocytomas treated by 1-(4-amino-2-methyl-5-pyrimidinyl)methyl-3-2(2-chloroethyl)-3-nitrosourea chemotherapy in combination with radiation and interferon therapy, and assessed the correlation of such methylation with clinical outcome. hMLH1 promoter methylation was found in 6 (15%) of the 41 newly diagnosed malignant astrocytomas. Hypermethylation of the hMLH1 promoter corresponded closely with a loss of immunohistochemical staining for hMLH1 protein (P = 0.0013). Patients with hMLH1-methylated tumors displayed a greater chance of responding to adjuvant therapy as compared with those with hMLH1-unmethylated tumors (P = 0.0150). The presence of hMLH1 hypermethylation was significantly associated with a longer progression-free survival on both univariate analysis (P = 0.0340) and multivariate analysis (P = 0.0161). The present study identified hMLH1 methylation status as a predictor of the clinical response of malignant astrocytomas to chloroethylnitrosourea-based adjuvant therapy. The findings obtained suggest that determination of the methylation status of hMLH1 could provide a potential basis for designing rational chemotherapeutic strategies, as well as for predicting prognosis.

  20. A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network

    Directory of Open Access Journals (Sweden)

    Han Kyungsook

    2010-06-01

    Full Text Available Abstract Background Genetic interaction profiles are highly informative and helpful for understanding the functional linkages between genes, and therefore have been extensively exploited for annotating gene functions and dissecting specific pathway structures. However, our understanding is rather limited to the relationship between double concurrent perturbation and various higher level phenotypic changes, e.g. those in cells, tissues or organs. Modifier screens, such as synthetic genetic arrays (SGA can help us to understand the phenotype caused by combined gene mutations. Unfortunately, exhaustive tests on all possible combined mutations in any genome are vulnerable to combinatorial explosion and are infeasible either technically or financially. Therefore, an accurate computational approach to predict genetic interaction is highly desirable, and such methods have the potential of alleviating the bottleneck on experiment design. Results In this work, we introduce a computational systems biology approach for the accurate prediction of pairwise synthetic genetic interactions (SGI. First, a high-coverage and high-precision functional gene network (FGN is constructed by integrating protein-protein interaction (PPI, protein complex and gene expression data; then, a graph-based semi-supervised learning (SSL classifier is utilized to identify SGI, where the topological properties of protein pairs in weighted FGN is used as input features of the classifier. We compare the proposed SSL method with the state-of-the-art supervised classifier, the support vector machines (SVM, on a benchmark dataset in S. cerevisiae to validate our method's ability to distinguish synthetic genetic interactions from non-interaction gene pairs. Experimental results show that the proposed method can accurately predict genetic interactions in S. cerevisiae (with a sensitivity of 92% and specificity of 91%. Noticeably, the SSL method is more efficient than SVM, especially for

  1. Enrichment of conserved synaptic activity-responsive element in neuronal genes predicts a coordinated response of MEF2, CREB and SRF.

    Directory of Open Access Journals (Sweden)

    Fernanda M Rodríguez-Tornos

    Full Text Available A unique synaptic activity-responsive element (SARE sequence, composed of the consensus binding sites for SRF, MEF2 and CREB, is necessary for control of transcriptional upregulation of the Arc gene in response to synaptic activity. We hypothesize that this sequence is a broad mechanism that regulates gene expression in response to synaptic activation and during plasticity; and that analysis of SARE-containing genes could identify molecular mechanisms involved in brain disorders. To search for conserved SARE sequences in the mammalian genome, we used the SynoR in silico tool, and found the SARE cluster predominantly in the regulatory regions of genes expressed specifically in the nervous system; most were related to neural development and homeostatic maintenance. Two of these SARE sequences were tested in luciferase assays and proved to promote transcription in response to neuronal activation. Supporting the predictive capacity of our candidate list, up-regulation of several SARE containing genes in response to neuronal activity was validated using external data and also experimentally using primary cortical neurons and quantitative real time RT-PCR. The list of SARE-containing genes includes several linked to mental retardation and cognitive disorders, and is significantly enriched in genes that encode mRNA targeted by FMRP (fragile X mental retardation protein. Our study thus supports the idea that SARE sequences are relevant transcriptional regulatory elements that participate in plasticity. In addition, it offers a comprehensive view of how activity-responsive transcription factors coordinate their actions and increase the selectivity of their targets. Our data suggest that analysis of SARE-containing genes will reveal yet-undescribed pathways of synaptic plasticity and additional candidate genes disrupted in mental disease.

  2. Prediction and characterisation of a highly conserved, remote and cAMP responsive enhancer that regulates Msx1 gene expression in cardiac neural crest and outflow tract.

    Science.gov (United States)

    Miller, Kerry Ann; Davidson, Scott; Liaros, Angela; Barrow, John; Lear, Marissa; Heine, Danielle; Hoppler, Stefan; MacKenzie, Alasdair

    2008-05-15

    Double knockouts of the Msx1 and Msx2 genes in the mouse result in severe cardiac outflow tract malformations similar to those frequently found in newborn infants. Despite the known role of the Msx genes in cardiac formation little is known of the regulatory systems (ligand receptor, signal transduction and protein-DNA interactions) that regulate the tissue-specific expression of the Msx genes in mammals during the formation of the outflow tract. In the present study we have used a combination of multi-species comparative genomics, mouse transgenic analysis and in-situ hybridisation to predict and validate the existence of a remote ultra-conserved enhancer that supports the expression of the Msx1 gene in migrating mouse cardiac neural crest and the outflow tract primordia. Furthermore, culturing of embryonic explants derived from transgenic lines with agonists of the PKC and PKA signal transduction systems demonstrates that this remote enhancer is influenced by PKA but not PKC dependent gene regulatory systems. These studies demonstrate the efficacy of combining comparative genomics and transgenic analyses and provide a platform for the study of the possible roles of Msx gene mis-regulation in the aetiology of congenital heart malformation.

  3. Perturbation of B Cell Gene Expression Persists in HIV-Infected Children Despite Effective Antiretroviral Therapy and Predicts H1N1 Response.

    Science.gov (United States)

    Cotugno, Nicola; De Armas, Lesley; Pallikkuth, Suresh; Rinaldi, Stefano; Issac, Biju; Cagigi, Alberto; Rossi, Paolo; Palma, Paolo; Pahwa, Savita

    2017-01-01

    Despite effective antiretroviral therapy (ART), HIV-infected individuals with apparently similar clinical and immunological characteristics can vary in responsiveness to vaccinations. However, molecular mechanisms responsible for such impairment, as well as biomarkers able to predict vaccine responsiveness in HIV-infected children, remain unknown. Following the hypothesis that a B cell qualitative impairment persists in HIV-infected children (HIV) despite effective ART and phenotypic B cell immune reconstitution, the aim of the current study was to investigate B cell gene expression of HIV compared to age-matched healthy controls (HCs) and to determine whether distinct gene expression patterns could predict the ability to respond to influenza vaccine. To do so, we analyzed prevaccination transcriptional levels of a 96-gene panel in equal numbers of sort-purified B cell subsets (SPBS) isolated from peripheral blood mononuclear cells using multiplexed RT-PCR. Immune responses to H1N1 antigen were determined by hemaglutination inhibition and memory B cell ELISpot assays following trivalent-inactivated influenza vaccination (TIV) for all study participants. Although there were no differences in terms of cell frequencies of SPBS between HIV and HC, the groups were distinguishable based upon gene expression analyses. Indeed, a 28-gene signature, characterized by higher expression of genes involved in the inflammatory response and immune activation was observed in activated memory B cells (CD27 + CD21 - ) from HIV when compared to HC despite long-term viral control (>24 months). Further analysis, taking into account H1N1 responses after TIV in HIV participants, revealed that a 25-gene signature in resting memory (RM) B cells (CD27 + CD21 + ) was able to distinguish vaccine responders from non-responders (NR). In fact, prevaccination RM B cells of responders showed a higher expression of gene sets involved in B cell adaptive immune responses ( APRIL, BTK, BLIMP1 ) and

  4. Target genes prediction and functional analysis of microRNAs differentially expressed in gastric cancer stem cells MKN-45

    Directory of Open Access Journals (Sweden)

    Zohreh Salehi

    2017-01-01

    Conclusions: Bioinformatics analysis such as DAVID database, GO biological process, GO molecular function, Kyoto encyclopedia of genes and genomes pathways, BioCarta pathway, Panther pathway, and Reactome pathway revealed that target genes of differentially expressed miRNAs in gastric CSCs were connected to pivotal biological pathways that involved in cell cycle regulation, stemness properties, and differentiation.

  5. Computational sequence analysis of predicted long dsRNA transcriptomes of major crops reveals sequence complementarity with human genes.

    Science.gov (United States)

    Jensen, Peter D; Zhang, Yuanji; Wiggins, B Elizabeth; Petrick, Jay S; Zhu, Jin; Kerstetter, Randall A; Heck, Gregory R; Ivashuta, Sergey I

    2013-01-01

    Long double-stranded RNAs (long dsRNAs) are precursors for the effector molecules of sequence-specific RNA-based gene silencing in eukaryotes. Plant cells can contain numerous endogenous long dsRNAs. This study demonstrates that such endogenous long dsRNAs in plants have sequence complementarity to human genes. Many of these complementary long dsRNAs have perfect sequence complementarity of at least 21 nucleotides to human genes; enough complementarity to potentially trigger gene silencing in targeted human cells if delivered in functional form. However, the number and diversity of long dsRNA molecules in plant tissue from crops such as lettuce, tomato, corn, soy and rice with complementarity to human genes that have a long history of safe consumption supports a conclusion that long dsRNAs do not present a significant dietary risk.

  6. The Mapping of Predicted Triplex DNA:RNA in the Drosophila Genome Reveals a Prominent Location in Development- and Morphogenesis-Related Genes

    Directory of Open Access Journals (Sweden)

    Claude Pasquier

    2017-07-01

    Full Text Available Double-stranded DNA is able to form triple-helical structures by accommodating a third nucleotide strand. A nucleic acid triplex occurs according to Hoogsteen rules that predict the stability and affinity of the third strand bound to the Watson–Crick duplex. The “triplex-forming oligonucleotide” (TFO can be a short sequence of RNA that binds to the major groove of the targeted duplex only when this duplex presents a sequence of purine or pyrimidine bases in one of the DNA strands. Many nuclear proteins are known to bind triplex DNA or DNA:RNA, but their biological functions are unexplored. We identified sequences that are capable of engaging as the “triplex-forming oligonucleotide” in both the pre-lncRNA and pre-mRNA collections of Drosophila melanogaster. These motifs were matched against the Drosophila genome in order to identify putative sequences of triplex formation in intergenic regions, promoters, and introns/exons. Most of the identified TFOs appear to be located in the intronic region of the analyzed genes. Computational prediction of the most targeted genes by TFOs originating from pre-lncRNAs and pre-mRNAs revealed that they are restrictively associated with development- and morphogenesis-related gene networks. The refined analysis by Gene Ontology enrichment demonstrates that some individual TFOs present genome-wide scale matches that are located in numerous genes and regulatory sequences. The triplex DNA:RNA computational mapping at the genome-wide scale suggests broad interference in the regulatory process of the gene networks orchestrated by TFO RNAs acting in association simultaneously at multiple sites.

  7. Integrative miRNA-Gene Expression Analysis Enables Refinement of Associated Biology and Prediction of Response to Cetuximab in Head and Neck Squamous Cell Cancer

    Directory of Open Access Journals (Sweden)

    Loris De Cecco

    2017-01-01

    Full Text Available This paper documents the process by which we, through gene and miRNA expression profiling of the same samples of head and neck squamous cell carcinomas (HNSCC and an integrative miRNA-mRNA expression analysis, were able to identify candidate biomarkers of progression-free survival (PFS in patients treated with cetuximab-based approaches. Through sparse partial least square–discriminant analysis (sPLS-DA and supervised analysis, 36 miRNAs were identified in two components that clearly separated long- and short-PFS patients. Gene set enrichment analysis identified a significant correlation between the miRNA first-component and EGFR signaling, keratinocyte differentiation, and p53. Another significant correlation was identified between the second component and RAS, NOTCH, immune/inflammatory response, epithelial–mesenchymal transition (EMT, and angiogenesis pathways. Regularized canonical correlation analysis of sPLS-DA miRNA and gene data combined with the MAGIA2 web-tool highlighted 16 miRNAs and 84 genes that were interconnected in a total of 245 interactions. After feature selection by a smoothed t-statistic support vector machine, we identified three miRNAs and five genes in the miRNA-gene network whose expression result was the most relevant in predicting PFS (Area Under the Curve, AUC = 0.992. Overall, using a well-defined clinical setting and up-to-date bioinformatics tools, we are able to give the proof of principle that an integrative miRNA-mRNA expression could greatly contribute to the refinement of the biology behind a predictive model.

  8. ACE I/D Gene Polymorphism Can't Predict the Steroid Responsiveness in Asian Children with Idiopathic Nephrotic Syndrome: A Meta-Analysis

    Science.gov (United States)

    Su, Li-Na; Lei, Feng-Ying; Huang, Wei-Fang; Zhao, Yan-Jun

    2011-01-01

    Background The results from the published studies on the association between angiotensin-converting enzyme (ACE) insertion/deletion (I/D) gene polymorphism and the treatment response to steroid in Asian children with idiopathic nephrotic syndrome (INS) is still conflicting. This meta-analysis was performed to evaluate the relation between ACE I/D gene polymorphism and treatment response to steroid in Asian children and to explore whether ACE D allele or DD genotype could become a predictive marker for steroid responsiveness. Methodology/Principal Findings Association studies were identified from the databases of PubMed, Embase, Cochrane Library and CBM-disc (China Biological Medicine Database) as of September 1, 2010, and eligible investigations were synthesized using meta-analysis method. Five investigations were identified for the analysis of association between ACE I/D gene polymorphism and steroid-resistant nephrotic syndrome (SRNS) risk in Asian children and seven studies were included to explore the relationship between ACE I/D gene polymorphism and steroid-sensitive nephrotic syndrome (SSNS) susceptibility. Five investigations were recruited to explore the difference of ACE I/D gene distribution between SRNS and SSNS. There was no a markedly association between D allele or DD genotype and SRNS susceptibility or SSNS risk, and the gene distribution differences of ACE between SRNS and SSNS were not statistically significant. II genotype might play a positive role against SRNS onset but not for SSNS (OR = 0.51, P = 0.02; OR = 0.95, P = 0.85; respectively), however, the result for the association of II genotype with SRNS risk was not stable. Conclusions/Significance Our results indicate that D allele or DD homozygous can't become a significant genetic molecular marker to predict the treatment response to steroid in Asian children with INS. PMID:21611163

  9. A novel method of predicting microRNA-disease associations based on microRNA, disease, gene and environment factor networks.

    Science.gov (United States)

    Peng, Wei; Lan, Wei; Zhong, Jiancheng; Wang, Jianxin; Pan, Yi

    2017-07-15

    MicroRNAs have been reported to have close relationship with diseases due to their deregulation of the expression of target mRNAs. Detecting disease-related microRNAs is helpful for disease therapies. With the development of high throughput experimental techniques, a large number of microRNAs have been sequenced. However, it is still a big challenge to identify which microRNAs are related to diseases. Recently, researchers are interesting in combining multiple-biological information to identify the associations between microRNAs and diseases. In this work, we have proposed a novel method to predict the microRNA-disease associations based on four biological properties. They are microRNA, disease, gene and environment factor. Compared with previous methods, our method makes predictions not only by using the prior knowledge of associations among microRNAs, disease, environment factors and genes, but also by using the internal relationship among these biological properties. We constructed four biological networks based on the similarity of microRNAs, diseases, environment factors and genes, respectively. Then random walking was implemented on the four networks unequally. In the walking course, the associations can be inferred from the neighbors in the same networks. Meanwhile the association information can be transferred from one network to another. The results of experiment showed that our method achieved better prediction performance than other existing state-of-the-art methods. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. A 17-gene assay to predict prostate cancer aggressiveness in the context of Gleason grade heterogeneity, tumor multifocality, and biopsy undersampling.

    Science.gov (United States)

    Klein, Eric A; Cooperberg, Matthew R; Magi-Galluzzi, Cristina; Simko, Jeffry P; Falzarano, Sara M; Maddala, Tara; Chan, June M; Li, Jianbo; Cowan, Janet E; Tsiatis, Athanasios C; Cherbavaz, Diana B; Pelham, Robert J; Tenggara-Hunter, Imelda; Baehner, Frederick L; Knezevic, Dejan; Febbo, Phillip G; Shak, Steven; Kattan, Michael W; Lee, Mark; Carroll, Peter R

    2014-09-01

    Prostate tumor heterogeneity and biopsy undersampling pose challenges to accurate, individualized risk assessment for men with localized disease. To identify and validate a biopsy-based gene expression signature that predicts clinical recurrence, prostate cancer (PCa) death, and adverse pathology. Gene expression was quantified by reverse transcription-polymerase chain reaction for three studies-a discovery prostatectomy study (n=441), a biopsy study (n=167), and a prospectively designed, independent clinical validation study (n=395)-testing retrospectively collected needle biopsies from contemporary (1997-2011) patients with low to intermediate clinical risk who were candidates for active surveillance (AS). The main outcome measures defining aggressive PCa were clinical recurrence, PCa death, and adverse pathology at prostatectomy. Cox proportional hazards regression models were used to evaluate the association between gene expression and time to event end points. Results from the prostatectomy and biopsy studies were used to develop and lock a multigene-expression-based signature, called the Genomic Prostate Score (GPS); in the validation study, logistic regression was used to test the association between the GPS and pathologic stage and grade at prostatectomy. Decision-curve analysis and risk profiles were used together with clinical and pathologic characteristics to evaluate clinical utility. Of the 732 candidate genes analyzed, 288 (39%) were found to predict clinical recurrence despite heterogeneity and multifocality, and 198 (27%) were predictive of aggressive disease after adjustment for prostate-specific antigen, Gleason score, and clinical stage. Further analysis identified 17 genes representing multiple biological pathways that were combined into the GPS algorithm. In the validation study, GPS predicted high-grade (odds ratio [OR] per 20 GPS units: 2.3; 95% confidence interval [CI], 1.5-3.7; p<0.001) and high-stage (OR per 20 GPS units: 1.9; 95% CI, 1

  11. Combined serial analysis of gene expression and transcription factor binding site prediction identifies novel-candidate-target genes of Nr2e1 in neocortex development.

    Science.gov (United States)

    Schmouth, Jean-François; Arenillas, David; Corso-Díaz, Ximena; Xie, Yuan-Yun; Bohacec, Slavita; Banks, Kathleen G; Bonaguro, Russell J; Wong, Siaw H; Jones, Steven J M; Marra, Marco A; Simpson, Elizabeth M; Wasserman, Wyeth W

    2015-07-24

    Nr2e1 (nuclear receptor subfamily 2, group e, member 1) encodes a transcription factor important in neocortex development. Previous work has shown that nuclear receptors can have hundreds of target genes, and bind more than 300 co-interacting proteins. However, recognition of the critical role of Nr2e1 in neural stem cells and neocortex development is relatively recent, thus the molecular mechanisms involved for this nuclear receptor are only beginning to be understood. Serial analysis of gene expression (SAGE), has given researchers both qualitative and quantitative information pertaining to biological processes. Thus, in this work, six LongSAGE mouse libraries were generated from laser microdissected tissue samples of dorsal VZ/SVZ (ventricular zone and subventricular zone) from the telencephalon of wild-type (Wt) and Nr2e1-null embryos at the critical development ages E13.5, E15.5, and E17.5. We then used a novel approach, implementing multiple computational methods followed by biological validation to further our understanding of Nr2e1 in neocortex development. In this work, we have generated a list of 1279 genes that are differentially expressed in response to altered Nr2e1 expression during in vivo neocortex development. We have refined this list to 64 candidate direct-targets of NR2E1. Our data suggested distinct roles for Nr2e1 during different neocortex developmental stages. Most importantly, our results suggest a possible novel pathway by which Nr2e1 regulates neurogenesis, which includes Lhx2 as one of the candidate direct-target genes, and SOX9 as a co-interactor. In conclusion, we have provided new candidate interacting partners and numerous well-developed testable hypotheses for understanding the pathways by which Nr2e1 functions to regulate neocortex development.

  12. A Shortest-Path-Based Method for the Analysis and Prediction of Fruit-Related Genes in Arabidopsis thaliana.

    Science.gov (United States)

    Zhu, Liucun; Zhang, Yu-Hang; Su, Fangchu; Chen, Lei; Huang, Tao; Cai, Yu-Dong

    2016-01-01

    Biologically, fruits are defined as seed-bearing reproductive structures in angiosperms that develop from the ovary. The fertilization, development and maturation of fruits are crucial for plant reproduction and are precisely regulated by intrinsic genetic regulatory factors. In this study, we used Arabidopsis thaliana as a model organism and attempted to identify novel genes related to fruit-associated biological processes. Specifically, using validated genes, we applied a shortest-path-based method to identify several novel genes in a large network constructed using the protein-protein interactions observed in Arabidopsis thaliana. The described analyses indicate that several of the discovered genes are associated with fruit fertilization, development and maturation in Arabidopsis thaliana.

  13. Measurement of circulating transcripts and gene cluster analysis predicts and defines therapeutic efficacy of peptide receptor radionuclide therapy (PRRT) in neuroendocrine tumors

    International Nuclear Information System (INIS)

    Bodei, L.; Kidd, M.; Modlin, I.M.; Severi, S.; Nicolini, S.; Paganelli, G.; Drozdov, I.; Kwekkeboom, D.J.; Krenning, E.P.; Baum, R.P.

    2016-01-01

    Peptide receptor radionuclide therapy (PRRT) is an effective method for treating neuroendocrine tumors (NETs). It is limited, however, in the prediction of individual tumor response and the precise and early identification of changes in tumor size. Currently, response prediction is based on somatostatin receptor expression and efficacy by morphological imaging and/or chromogranin A (CgA) measurement. The aim of this study was to assess the accuracy of circulating NET transcripts as a measure of PRRT efficacy, and moreover to identify prognostic gene clusters in pretreatment blood that could be interpolated with relevant clinical features in order to define a biological index for the tumor and a predictive quotient for PRRT efficacy. NET patients (n = 54), M: F 37:17, median age 66, bronchial: n = 13, GEP-NET: n = 35, CUP: n = 6 were treated with 177 Lu-based-PRRT (cumulative activity: 6.5-27.8 GBq, median 18.5). At baseline: 47/54 low-grade (G1/G2; bronchial typical/atypical), 31/49 18 FDG positive and 39/54 progressive. Disease status was assessed by RECIST1.1. Transcripts were measured by real-time quantitative reverse transcription PCR (qRT-PCR) and multianalyte algorithmic analysis (NETest); CgA by enzyme-linked immunosorbent assay (ELISA). Gene cluster (GC) derivations: regulatory network, protein:protein interactome analyses. Statistical analyses: chi-square, non-parametric measurements, multiple regression, receiver operating characteristic and Kaplan-Meier survival. The disease control rate was 72 %. Median PFS was not achieved (follow-up: 1-33 months, median: 16). Only grading was associated with response (p < 0.01). At baseline, 94 % of patients were NETest-positive, while CgA was elevated in 59 %. NETest accurately (89 %, χ 2 = 27.4; p = 1.2 x 10 -7 ) correlated with treatment response, while CgA was 24 % accurate. Gene cluster expression (growth-factor signalome and metabolome) had an AUC of 0.74 ± 0.08 (z-statistic = 2.92, p < 0.004) for predicting

  14. Measurement of circulating transcripts and gene cluster analysis predicts and defines therapeutic efficacy of peptide receptor radionuclide therapy (PRRT) in neuroendocrine tumors

    Energy Technology Data Exchange (ETDEWEB)

    Bodei, L. [European Institute of Oncology, Division of Nuclear Medicine, Milan (Italy); LuGenIum Consortium, Milan, Rotterdam, Bad Berka, London, Italy, Netherlands, Germany (Country Unknown); Kidd, M. [Wren Laboratories, Branford, CT (United States); Modlin, I.M. [LuGenIum Consortium, Milan, Rotterdam, Bad Berka, London, Italy, Netherlands, Germany (Country Unknown); Yale School of Medicine, New Haven, CT (United States); Severi, S.; Nicolini, S.; Paganelli, G. [Istituto Scientifico Romagnolo per lo Studio e la Cura dei Tumori (IRST) IRCCS, Nuclear Medicine and Radiometabolic Units, Meldola (Italy); Drozdov, I. [Bering Limited, London (United Kingdom); Kwekkeboom, D.J.; Krenning, E.P. [LuGenIum Consortium, Milan, Rotterdam, Bad Berka, London, Italy, Netherlands, Germany (Country Unknown); Erasmus Medical Center, Nuclear Medicine Department, Rotterdam (Netherlands); Baum, R.P. [LuGenIum Consortium, Milan, Rotterdam, Bad Berka, London, Italy, Netherlands, Germany (Country Unknown); Zentralklinik Bad Berka, Theranostics Center for Molecular Radiotherapy and Imaging, Bad Berka (Germany)

    2016-05-15

    Peptide receptor radionuclide therapy (PRRT) is an effective method for treating neuroendocrine tumors (NETs). It is limited, however, in the prediction of individual tumor response and the precise and early identification of changes in tumor size. Currently, response prediction is based on somatostatin receptor expression and efficacy by morphological imaging and/or chromogranin A (CgA) measurement. The aim of this study was to assess the accuracy of circulating NET transcripts as a measure of PRRT efficacy, and moreover to identify prognostic gene clusters in pretreatment blood that could be interpolated with relevant clinical features in order to define a biological index for the tumor and a predictive quotient for PRRT efficacy. NET patients (n = 54), M: F 37:17, median age 66, bronchial: n = 13, GEP-NET: n = 35, CUP: n = 6 were treated with {sup 177}Lu-based-PRRT (cumulative activity: 6.5-27.8 GBq, median 18.5). At baseline: 47/54 low-grade (G1/G2; bronchial typical/atypical), 31/49 {sup 18}FDG positive and 39/54 progressive. Disease status was assessed by RECIST1.1. Transcripts were measured by real-time quantitative reverse transcription PCR (qRT-PCR) and multianalyte algorithmic analysis (NETest); CgA by enzyme-linked immunosorbent assay (ELISA). Gene cluster (GC) derivations: regulatory network, protein:protein interactome analyses. Statistical analyses: chi-square, non-parametric measurements, multiple regression, receiver operating characteristic and Kaplan-Meier survival. The disease control rate was 72 %. Median PFS was not achieved (follow-up: 1-33 months, median: 16). Only grading was associated with response (p < 0.01). At baseline, 94 % of patients were NETest-positive, while CgA was elevated in 59 %. NETest accurately (89 %, χ{sup 2} = 27.4; p = 1.2 x 10{sup -7}) correlated with treatment response, while CgA was 24 % accurate. Gene cluster expression (growth-factor signalome and metabolome) had an AUC of 0.74 ± 0.08 (z-statistic = 2.92, p < 0

  15. Reduction in WT1 gene expression during early treatment predicts the outcome in patients with acute myeloid leukemia.

    Science.gov (United States)

    Andersson, Charlotta; Li, Xingru; Lorenz, Fryderyk; Golovleva, Irina; Wahlin, Anders; Li, Aihong

    2012-12-01

    Wilms tumor gene 1 (WT1) expression has been suggested as an applicable minimal residual disease marker in acute myeloid leukemia (AML). We evaluated the use of this marker in 43 adult AML patients. Quantitative assessment of WT1 gene transcripts was performed using real-time quantitative-polymerase chain reaction assay. Samples from both the peripheral blood and the bone marrow were analyzed at diagnosis and during follow-up. A strong correlation was observed between WT1 normalized with 2 different control genes (β-actin and ABL1, P0.05). A≥1-log reduction in WT1 expression in bone marrow samples taken freedom from relapse (P=0.010) when β-actin was used as control gene. Furthermore, a reduction in WT1 expression by ≥2 logs in peripheral blood samples taken at a later time point significantly correlated with a better outcome for overall survival (P=0.004) and freedom from relapse (P=0.012). This result was achieved when normalizing against both β-actin and ABL1. These results therefore suggest that WT1 gene expression can provide useful information for minimal residual disease detection in adult AML patients and that combined use of control genes can give more informative results.

  16. The GENOTEND chip: a new tool to analyse gene expression in muscles of beef cattle for beef quality prediction.

    Science.gov (United States)

    Hocquette, Jean-Francois; Bernard-Capel, Carine; Vidal, Veronique; Jesson, Beline; Levéziel, Hubert; Renand, Gilles; Cassar-Malek, Isabelle

    2012-08-15

    Previous research programmes have described muscle biochemical traits and gene expression levels associated with beef tenderness. One of our results concerning the DNAJA1 gene (an Hsp40) was patented. This study aims to confirm the relationships previously identified between two gene families (heat shock proteins and energy metabolism) and beef quality. We developed an Agilent chip with specific probes for bovine muscular genes. More than 3000 genes involved in muscle biology or meat quality were selected from genetic, proteomic or transcriptomic studies, or from scientific publications. As far as possible, several probes were used for each gene (e.g. 17 probes for DNAJA1). RNA from Longissimus thoracis muscle samples was hybridised on the chips. Muscles samples were from four groups of Charolais cattle: two groups of young bulls and two groups of steers slaughtered in two different years. Principal component analysis, simple correlation of gene expression levels with tenderness scores, and then multiple regression analysis provided the means to detect the genes within two families (heat shock proteins and energy metabolism) which were the most associated with beef tenderness. For the 25 Charolais young bulls slaughtered in year 1, expression levels of DNAJA1 and other genes of the HSP family were related to the initial or overall beef tenderness. Similarly, expression levels of genes involved in fat or energy metabolism were related with the initial or overall beef tenderness but in the year 1 and year 2 groups of young bulls only. Generally, the genes individually correlated with tenderness are not consistent across genders and years indicating the strong influence of rearing conditions on muscle characteristics related to beef quality. However, a group of HSP genes, which explained about 40% of the variability in tenderness in the group of 25 young bulls slaughtered in year 1 (considered as the reference group), was validated in the groups of 30 Charolais young

  17. The GENOTEND chip: a new tool to analyse gene expression in muscles of beef cattle for beef quality prediction

    Directory of Open Access Journals (Sweden)

    Hocquette Jean-Francois

    2012-08-01

    Full Text Available Abstract Background Previous research programmes have described muscle biochemical traits and gene expression levels associated with beef tenderness. One of our results concerning the DNAJA1 gene (an Hsp40 was patented. This study aims to confirm the relationships previously identified between two gene families (heat shock proteins and energy metabolism and beef quality. Results We developed an Agilent chip with specific probes for bovine muscular genes. More than 3000 genes involved in muscle biology or meat quality were selected from genetic, proteomic or transcriptomic studies, or from scientific publications. As far as possible, several probes were used for each gene (e.g. 17 probes for DNAJA1. RNA from Longissimus thoracis muscle samples was hybridised on the chips. Muscles samples were from four groups of Charolais cattle: two groups of young bulls and two groups of steers slaughtered in two different years. Principal component analysis, simple correlation of gene expression levels with tenderness scores, and then multiple regression analysis provided the means to detect the genes within two families (heat shock proteins and energy metabolism which were the most associated with beef tenderness. For the 25 Charolais young bulls slaughtered in year 1, expression levels of DNAJA1 and other genes of the HSP family were related to the initial or overall beef tenderness. Similarly, expression levels of genes involved in fat or energy metabolism were related with the initial or overall beef tenderness but in the year 1 and year 2 groups of young bulls only. Generally, the genes individually correlated with tenderness are not consistent across genders and years indicating the strong influence of rearing conditions on muscle characteristics related to beef quality. However, a group of HSP genes, which explained about 40% of the variability in tenderness in the group of 25 young bulls slaughtered in year 1 (considered as the reference group, was

  18. Prediction of Genes Related to Positive Selection Using Whole-Genome Resequencing in Three Commercial Pig Breeds

    Directory of Open Access Journals (Sweden)

    HyoYoung Kim

    2015-12-01

    Full Text Available Selective sweep can cause genetic differentiation across populations, which allows for the identification of possible causative regions/genes underlying important traits. The pig has experienced a long history of allele frequency changes through artificial selection in the domestication process. We obtained an average of 329,482,871 sequence reads for 24 pigs from three pig breeds: Yorkshire (n = 5, Landrace (n = 13, and Duroc (n = 6. An average read depth of 11.7 was obtained using whole-genome resequencing on an Illumina HiSeq2000 platform. In this study, cross-population extended haplotype homozygosity and cross-population composite likelihood ratio tests were implemented to detect genes experiencing positive selection for the genome-wide resequencing data generated from three commercial pig breeds. In our results, 26, 7, and 14 genes from Yorkshire, Landrace, and Duroc, respectively were detected by two kinds of statistical tests. Significant evidence for positive selection was identified on genes ST6GALNAC2 and EPHX1 in Yorkshire, PARK2 in Landrace, and BMP6, SLA-DQA1, and PRKG1 in Duroc.These genes are reportedly relevant to lactation, reproduction, meat quality, and growth traits. To understand how these single nucleotide polymorphisms (SNPs related positive selection affect protein function, we analyzed the effect of non-synonymous SNPs. Three SNPs (rs324509622, rs80931851, and rs80937718 in the SLA-DQA1 gene were significant in the enrichment tests, indicating strong evidence for positive selection in Duroc. Our analyses identified genes under positive selection for lactation, reproduction, and meat-quality and growth traits in Yorkshire, Landrace, and Duroc, respectively.

  19. HPV and high-risk gene expression profiles predict response to chemoradiotherapy in head and neck cancer, independent of clinical factors

    International Nuclear Information System (INIS)

    Jong, Monique C. de; Pramana, Jimmy; Knegjens, Joost L.; Balm, Alfons J.M.; Brekel, Michiel W.M. van den; Hauptmann, Michael; Begg, Adrian C.; Rasch, Coen R.N.

    2010-01-01

    Purpose: The purpose of this study was to combine gene expression profiles and clinical factors to provide a better prediction model of local control after chemoradiotherapy for advanced head and neck cancer. Material and methods: Gene expression data were available for a series of 92 advanced stage head and neck cancer patients treated with primary chemoradiotherapy. The effect of the Chung high-risk and Slebos HPV expression profiles on local control was analyzed in a model with age at diagnosis, gender, tumor site, tumor volume, T-stage and N-stage and HPV profile status. Results: Among 75 patients included in the study, the only factors significantly predicting local control were tumor site (oral cavity vs. Pharynx, hazard ratio 4.2 [95% CI 1.4-12.5]), Chung gene expression status (high vs. Low risk profile, hazard ratio 4.4 [95% CI 1.5-13.3]) and HPV profile (negative vs. Positive profile, hazard ratio 6.2 [95% CI 1.7-22.5]). Conclusions: Chung high-risk expression profile and a negative HPV expression profile were significantly associated with increased risk of local recurrence after chemoradiotherapy in advanced pharynx and oral cavity tumors, independent of clinical factors.

  20. Development and validation of a gene expression-based signature to predict distant metastasis in locoregionally advanced nasopharyngeal carcinoma: a retrospective, multicentre, cohort study.

    Science.gov (United States)

    Tang, Xin-Ran; Li, Ying-Qin; Liang, Shao-Bo; Jiang, Wei; Liu, Fang; Ge, Wen-Xiu; Tang, Ling-Long; Mao, Yan-Ping; He, Qing-Mei; Yang, Xiao-Jing; Zhang, Yuan; Wen, Xin; Zhang, Jian; Wang, Ya-Qin; Zhang, Pan-Pan; Sun, Ying; Yun, Jing-Ping; Zeng, Jing; Li, Li; Liu, Li-Zhi; Liu, Na; Ma, Jun

    2018-03-01

    Gene expression patterns can be used as prognostic biomarkers in various types of cancers. We aimed to identify a gene expression pattern for individual distant metastatic risk assessment in patients with locoregionally advanced nasopharyngeal carcinoma. In this multicentre, retrospective, cohort analysis, we included 937 patients with locoregionally advanced nasopharyngeal carcinoma from three Chinese hospitals: the Sun Yat-sen University Cancer Center (Guangzhou, China), the Affiliated Hospital of Guilin Medical University (Guilin, China), and the First People's Hospital of Foshan (Foshan, China). Using microarray analysis, we profiled mRNA gene expression between 24 paired locoregionally advanced nasopharyngeal carcinoma tumours from patients at Sun Yat-sen University Cancer Center with or without distant metastasis after radical treatment. Differentially expressed genes were examined using digital expression profiling in a training cohort (Guangzhou training cohort; n=410) to build a gene classifier using a penalised regression model. We validated the prognostic accuracy of this gene classifier in an internal validation cohort (Guangzhou internal validation cohort, n=204) and two external independent cohorts (Guilin cohort, n=165; Foshan cohort, n=158). The primary endpoint was distant metastasis-free survival. Secondary endpoints were disease-free survival and overall survival. We identified 137 differentially expressed genes between metastatic and non-metastatic locoregionally advanced nasopharyngeal carcinoma tissues. A distant metastasis gene signature for locoregionally advanced nasopharyngeal carcinoma (DMGN) that consisted of 13 genes was generated to classify patients into high-risk and low-risk groups in the training cohort. Patients with high-risk scores in the training cohort had shorter distant metastasis-free survival (hazard ratio [HR] 4·93, 95% CI 2·99-8·16; padvanced nasopharyngeal carcinoma and might be able to predict which patients benefit

  1. Mammalian transcriptional hotspots are enriched for tissue specific enhancers near cell type specific highly expressed genes and are predicted to act as transcriptional activator hubs.

    Science.gov (United States)

    Joshi, Anagha

    2014-12-30

    Transcriptional hotspots are defined as genomic regions bound by multiple factors. They have been identified recently as cell type specific enhancers regulating developmentally essential genes in many species such as worm, fly and humans. The in-depth analysis of hotspots across multiple cell types in same species still remains to be explored and can bring new biological insights. We therefore collected 108 transcription-related factor (TF) ChIP sequencing data sets in ten murine cell types and classified the peaks in each cell type in three groups according to binding occupancy as singletons (low-occupancy), combinatorials (mid-occupancy) and hotspots (high-occupancy). The peaks in the three groups clustered largely according to the occupancy, suggesting priming of genomic loci for mid occupancy irrespective of cell type. We then characterized hotspots for diverse structural functional properties. The genes neighbouring hotspots had a small overlap with hotspot genes in other cell types and were highly enriched for cell type specific function. Hotspots were enriched for sequence motifs of key TFs in that cell type and more than 90% of hotspots were occupied by pioneering factors. Though we did not find any sequence signature in the three groups, the H3K4me1 binding profile had bimodal peaks at hotspots, distinguishing hotspots from mono-modal H3K4me1 singletons. In ES cells, differentially expressed genes after perturbation of activators were enriched for hotspot genes suggesting hotspots primarily act as transcriptional activator hubs. Finally, we proposed that ES hotspots might be under control of SetDB1 and not DNMT for silencing. Transcriptional hotspots are enriched for tissue specific enhancers near cell type specific highly expressed genes. In ES cells, they are predicted to act as transcriptional activator hubs and might be under SetDB1 control for silencing.

  2. Prediction of drug efficacy for cancer treatment based on comparative analysis of chemosensitivity and gene expression data

    DEFF Research Database (Denmark)

    Wan, Peng; Li, Qiyuan; Larsen, Jens Erik Pontoppidan

    2012-01-01

    The NCI60 database is the largest available collection of compounds with measured anti-cancer activity. The strengths and limitations for using the NCI60 database as a source of new anti-cancer agents are explored and discussed in relation to previous studies. We selected a sub-set of 2333...... and in a data set of expression profiles of 1901 genes for the corresponding tumor cell lines. Five clusters were identified based on the gene expression data using self-organizing maps (SOM), comprising leukemia, melanoma, ovarian and prostate, basal breast, and luminal breast cancer cells, respectively....... The strong difference in gene expression between basal and luminal breast cancer cells was reflected clearly in the chemosensitivity data. Although most compounds in the data set were of low potency, high efficacy compounds that showed specificity with respect to tissue of origin could be found. Furthermore...

  3. Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.

    Science.gov (United States)

    Zhou, Hang; Yang, Yang; Shen, Hong-Bin

    2017-03-15

    Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models. In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  4. Evaluating the performance of clinical criteria for predicting mismatch repair gene mutations in Lynch syndrome: a comprehensive analysis of 3,671 families.

    Science.gov (United States)

    Steinke, Verena; Holzapfel, Stefanie; Loeffler, Markus; Holinski-Feder, Elke; Morak, Monika; Schackert, Hans K; Görgens, Heike; Pox, Christian; Royer-Pokora, Brigitte; von Knebel-Doeberitz, Magnus; Büttner, Reinhard; Propping, Peter; Engel, Christoph

    2014-07-01

    Carriers of mismatch repair (MMR) gene mutations have a high lifetime risk for colorectal and endometrial cancers, as well as other malignancies. As mutation analysis to detect these patients is expensive and time-consuming, clinical criteria and tumor-tissue analysis are widely used as pre-screening methods. The aim of our study was to evaluate the performance of commonly applied clinical criteria (the Amsterdam I and II Criteria, and the original and revised Bethesda Guidelines) and the results of tumor-tissue analysis in predicting MMR gene mutations. We analyzed 3,671 families from the German HNPCC Registry and divided them into nine mutually exclusive groups with different clinical criteria. A total of 680 families (18.5%) were found to have a pathogenic MMR gene mutation. Among all 1,284 families with microsatellite instability-high (MSI-H) colorectal cancer, the overall mutation detection rate was 53.0%. Mutation frequencies and their distribution between the four MMR genes differed significantly between clinical groups (p small-bowel cancer (p small-bowel cancer were clinically relevant predictors for Lynch syndrome. © 2013 UICC.

  5. Gene expression signatures predict outcome in non-muscle invasive bladder carcinoma - a multi-center validation study

    DEFF Research Database (Denmark)

    Andersen, Lars Dyrskjøt; Zieger, Karsten; Real, Francisco X.

    2007-01-01

    and carcinoma in situ (CIS) and for predicting disease recurrence and progression. EXPERIMENTAL DESIGN: We analyzed tumors from 404 patients diagnosed with bladder cancer in hospitals in Denmark, Sweden, England, Spain, and France using custom microarrays. Molecular classifications were compared with pathologic....... CONCLUSION: This multicenter validation study confirms in an independent series the clinical utility of molecular classifiers to predict the outcome of patients initially diagnosed with non-muscle-invasive bladder cancer. This information may be useful to better guide patient treatment....

  6. Do aberrant crypt foci have predictive value for the occurrence of colorectal tumours? Potential of gene expression profiling in tumours

    NARCIS (Netherlands)

    Wijnands, M.V.W.; Erk, van M.J.; Doornbos, R.P.; Krul, C.A.M.; Woutersen, R.A.

    2004-01-01

    The effects of different dietary compounds on the formation of aberrant crypt foci (ACF) and colorectal tumours and on the expression of a selection of genes were studied in rats. Azoxymethane-treated male F344 rats were fed either a control diet or a diet containing 10% wheat bran (WB), 0.2%

  7. A predictive coexpression network identifies novel genes controlling the seed-to-seedling phase transition in arabidopsis Thaliana

    NARCIS (Netherlands)

    Silva, Anderson Tadeu; Ribone, Pamela A.; Chan, Raquel L.; Ligterink, Wilco; Hilhorst, Henk W.M.

    2016-01-01

    The transition from a quiescent dry seed to an actively growing photoautotrophic seedling is a complex and crucial trait for plant propagation. This study provides a detailed description of global gene expression in seven successive developmental stages of seedling establishment in Arabidopsis

  8. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks | Center for Cancer Research

    Science.gov (United States)

    The purpose of this study was to develop a method of classifying cancers to specific diagnostic categories based on their gene expression signatures using artificial neural networks (ANNs). We trained the ANNs using the small, round blue-cell tumors (SRBCTs) as a model. These cancers belong to four distinct diagnostic categories and often present diagnostic dilemmas in

  9. The relative abundance of predicted genes associated with ammonia-oxidation, nitrate reduction, and biomass decomposition in mineral soil are altered by intensive timber harvest.

    Science.gov (United States)

    Mushinski, R. M.; Zhou, Y.; Gentry, T. J.; Boutton, T. W.

    2017-12-01

    Forest ecosystems in the southern United States are substantially altered by anthropogenic disturbances such as timber harvest and land conversion, with effects being observed in carbon and nutrient pools as well as biogeochemical processes. Furthermore, the desire to develop renewable energy sources in the form of biomass extraction from logging residues may result in alterations in soil community structure and function. While the impact of forest management on soil physicochemical properties of the region has been studied, its' long-term effect on soil bacterial community composition and metagenomic potential is relatively unknown, especially at deeper soil depths. This study investigates how intensive organic matter removal intensities associated with timber harvest influence decadal-scale alterations in bacterial community structure and functional potential in the upper 1-m of the soil profile, 18 years post-harvest in a Pinus taeda L. forest of eastern Texas. Amplicon sequencing of the 16S rRNA gene was used in conjunction with soil chemical analyses to evaluate treatment-induced differences in community composition and potential environmental drivers of associated change. Furthermore, functional potential was assessed by using amplicon data to make metagenomic predictions. Results indicate that increasing organic matter removal intensity leads to altered community composition and the relative abundance of dominant OTUs annotated to Burkholderia and Aciditerrimonas. The relative abundance of predicted genes associated with dissimilatory nitrate reduction and denitrification were highest in the most intensively harvested treatment while genes involved in nitrification were significantly lower in the most intensively harvested treatment. Furthermore, genes associated with glycosyltransferases were significantly reduced with increasing harvest intensity while polysaccharide lyases increased. These results imply that intensive organic matter removal may create

  10. A functional polymorphism in a serotonin transporter gene (5-HTTLPR) interacts with 9/11 to predict gun-carrying behavior.

    Science.gov (United States)

    Barnes, J C; Beaver, Kevin M; Boutwell, Brian B

    2013-01-01

    On September 11, 2001, one of the deadliest terrorist attacks in US history took place on American soil and people around the world were impacted in myriad ways. Building on prior literature which suggests individuals are more likely to purchase a gun for self-protection if they are fearful of being victimized, the authors hypothesized that the terrorist attacks of 9/11 would lead to an increase in gun carrying among US residents. At the same time, a line of research has shown that a polymorphism in the 5-HTT gene (i.e., 5-HTTLPR) interacts with environmental stressors to predict a range of psychopathologies and behaviors. Thus, it was hypothesized that 9/11 and 5-HTTLPR would interact to predict gun carrying. The results supported both hypotheses by revealing a positive association between 9/11 and gun carrying (b = .426, odds ratio = 1.531, standard error for b = .194, z = 2.196, p = .028) in the full sample of respondents (n = 15,052) and a statistically significant interaction between 9/11 and 5-HTTLPR in the prediction of gun carrying (b = -1.519, odds ratio = .219, standard error for b = .703, z = -2.161, p = .031) in the genetic subsample of respondents (n = 2,350). This is one of the first studies to find an association between 9/11 and gun carrying and, more importantly, is the first study to report a gene-environment interaction (GxE) between a measured gene and a terrorist attack.

  11. Identification and Construction of Combinatory Cancer Hallmark-Based Gene Signature Sets to Predict Recurrence and Chemotherapy Benefit in Stage II Colorectal Cancer.

    Science.gov (United States)

    Gao, Shanwu; Tibiche, Chabane; Zou, Jinfeng; Zaman, Naif; Trifiro, Mark; O'Connor-McCourt, Maureen; Wang, Edwin

    2016-01-01

    Decisions regarding adjuvant therapy in patients with stage II colorectal cancer (CRC) have been among the most challenging and controversial in oncology over the past 20 years. To develop robust combinatory cancer hallmark-based gene signature sets (CSS sets) that more accurately predict prognosis and identify a subset of patients with stage II CRC who could gain survival benefits from adjuvant chemotherapy. Thirteen retrospective studies of patients with stage II CRC who had clinical follow-up and adjuvant chemotherapy were analyzed. Respective totals of 162 and 843 patients from 2 and 11 independent cohorts were used as the discovery and validation cohorts, respectively. A total of 1005 patients with stage II CRC were included in the 13 cohorts. Among them, 84 of 416 patients in 3 independent cohorts received fluorouracil-based adjuvant chemotherapy. Identification of CSS sets to predict relapse-free survival and identify a subset of patients with stage II CRC who could gain substantial survival benefits from fluorouracil-based adjuvant chemotherapy. Eight cancer hallmark-based gene signatures (30 genes each) were identified and used to construct CSS sets for determining prognosis. The CSS sets were validated in 11 independent cohorts of 767 patients with stage II CRC who did not receive adjuvant chemotherapy. The CSS sets accurately stratified patients into low-, intermediate-, and high-risk groups. Five-year relapse-free survival rates were 94%, 78%, and 45%, respectively, representing 60%, 28%, and 12% of patients with stage II disease. The 416 patients with CSS set-defined high-risk stage II CRC who received fluorouracil-based adjuvant chemotherapy showed a substantial gain in survival benefits from the treatment (ie, recurrence reduced by 30%-40% in 5 years). The CSS sets substantially outperformed other prognostic predictors of stage 2 CRC. They are more accurate and robust for prognostic predictions and facilitate the identification of patients with stage

  12. Quantitative Trait Locus (QTL meta-analysis and comparative genomics for candidate gene prediction in perennial ryegrass (Lolium perenne L.

    Directory of Open Access Journals (Sweden)

    Shinozuka Hiroshi

    2012-11-01

    Full Text Available Abstract Background In crop species, QTL analysis is commonly used for identification of factors contributing to variation of agronomically important traits. As an important pasture species, a large number of QTLs have been reported for perennial ryegrass based on analysis of biparental mapping populations. Further characterisation of those QTLs is, however, essential for utilisation in varietal improvement programs. Results A bibliographic survey of perennial ryegrass trait-dissection studies identified a total of 560 QTLs from previously published papers, of which 189, 270 and 101 were classified as morphology-, physiology- and resistance/tolerance-related loci, respectively. The collected dataset permitted a subsequent meta-QTL study and implementation of a cross-species candidate gene identification approach. A meta-QTL analysis based on use of the BioMercator software was performed to identify two consensus regions for pathogen resistance traits. Genes that are candidates for causal polymorphism underpinning perennial ryegrass QTLs were identified through in silico comparative mapping using rice databases, and 7 genes were assigned to the p150/112 reference map. Markers linked to the LpDGL1, LpPh1 and LpPIPK1 genes were located close to plant size, leaf extension time and heading date-related QTLs, respectively, suggesting that these genes may be functionally associated with important agronomic traits in perennial ryegrass. Conclusions Functional markers are valuable for QTL meta-analysis and comparative genomics. Enrichment of such genetic markers may permit further detailed characterisation of QTLs. The outcomes of QTL meta-analysis and comparative genomics studies may be useful for accelerated development of novel perennial ryegrass cultivars with desirable traits.

  13. Methylated Host Cell Gene Promoters and Human Papillomavirus Type 16 and 18 Predicting Cervical Lesions and Cancer.

    Directory of Open Access Journals (Sweden)

    Nina Milutin Gašperov

    Full Text Available Change in the host and/or human papillomavirus (HPV DNA methylation profile is probably one of the main factors responsible for the malignant progression of cervical lesions to cancer. To investigate those changes we studied 173 cervical samples with different grades of cervical lesion, from normal to cervical cancer. The methylation status of nine cellular gene promoters, CCNA1, CDH1, C13ORF18, DAPK1, HIC1, RARβ2, hTERT1, hTERT2 and TWIST1, was investigated by Methylation Specific Polymerase Chain Reaction (MSP. The methylation of HPV18 L1-gene was also investigated by MSP, while the methylated cytosines within four regions, L1, 5'LCR, enhancer, and promoter of the HPV16 genome covering 19 CpG sites were evaluated by bisulfite sequencing. Statistically significant methylation biomarkers distinguishing between cervical precursor lesions from normal cervix were primarily C13ORF18 and secondly CCNA1, and those distinguishing cervical cancer from normal or cervical precursor lesions were CCNA1, C13ORF18, hTERT1, hTERT2 and TWIST1. In addition, the methylation analysis of individual CpG sites of the HPV16 genome in different sample groups, notably the 7455 and 7694 sites, proved to be more important than the overall methylation frequency. The majority of HPV18 positive samples contained both methylated and unmethylated L1 gene, and samples with L1-gene methylated forms alone had better prognosis when correlated with the host cell gene promoters' methylation profiles. In conclusion, both cellular and viral methylation biomarkers should be used for monitoring cervical lesion progression to prevent invasive cervical cancer.

  14. Transferrin Level Before Treatment and Genetic Polymorphism in HFE Gene as Predictive Markers for Response to Adalimumab in Crohn's Disease Patients.

    Science.gov (United States)

    Repnik, Katja; Koder, Silvo; Skok, Pavel; Ferkolj, Ivan; Potočnik, Uroš

    2016-08-01

    Tumor necrosis factor α inhibitors (anti-TNF) have improved treatment of several complex diseases, including Crohn's disease (CD). However, the effect varies and approximately one-third of the patients do not respond. Since blood parameters as well as genetic factors have shown a great potential to predict response during treatment, the aim of the study was to evaluate response to anti-TNF treatment with adalimumab (ADA) between genes HFE and TF and haematological parameters in Slovenian refractory CD patients. Single nucleotide polymorphisms (SNPs) rs1799852 in gene TF and rs2071303 in gene HFE were genotyped in 68 refractory CD patients for which response has been measured using inflammatory bowel disease questionnaire (IBDQ) index. Haematological parameters and IBDQ index were determined before therapy and after 4, 12, 20 and 30 weeks. We found novel strong association between SNP rs2071303 in gene HFE and response to ADA treatment, particularly patients with G allele comparing to A allele had better response after 20 weeks (p = 0.008). Further, we found strong association between transferrin level at baseline and treatment response after 12, 20 and 30 weeks, where average transferrin level before therapy was lower in responders (2.38 g/L) compared to non-responders (2.89 g/L, p = 0.005). Association was found between transferrin level in week 30 and SNP rs1799852 (p = 0.023), and between MCHC level before treatment and SNP rs2071303 (p = 0.007). Our results suggest that SNP in gene HFE as well as haematological markers serve as promising prognostic markers of response to anti-TNF treatment in CD patients.

  15. Two-gene signature improves the discriminatory power of IASLC/ATS/ERS classification to predict the survival of patients with early-stage lung adenocarcinoma

    Directory of Open Access Journals (Sweden)

    Sun Y

    2016-07-01

    Full Text Available Yifeng Sun,1,* Likun Hou,2,* Yu Yang,1 Huikang Xie,2 Yang Yang,1 Zhigang Li,1 Heng Zhao,1 Wen Gao,3 Bo Su4 1Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiaotong University, 2Department of Pathology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, 3Department of Thoracic Surgery, Shanghai Huadong Hospital, Fudan University School of Medicine, Shanghai, 4Central Lab, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, People’s Republic of China *These authors contributed equally to this work Background: In this study, we investigated the contribution of a gene expression–based signature (composed of BAG1, BRCA1, CDC6, CDK2AP1, ERBB3, FUT3, IL11, LCK, RND3, SH3BGR to survival prediction for early-stage lung adenocarcinoma categorized by the new International Association for the Study of Lung Cancer (IASLC/the American Thoracic Society (ATS/the European Respiratory Society (ERS classification. We also aimed to verify whether gene signature improves the risk discrimination of IASLC/ATS/ERS classification in early-stage lung adenocarcinoma. Patients and methods: Total RNA was extracted from 93 patients with pathologically confirmed TNM stage Ia and Ib lung adenocarcinoma. The mRNA expression levels of ten genes in the signature (BAG1, BRCA1, CDC6, CDK2AP1, ERBB3, FUT3, IL11, LCK, RND3, and SH3BGR were detected using real-time polymerase chain reaction. Each patient was categorized according to the new IASLC/ATS/ERS classification by accessing hematoxylin–eosin-stained slides. The corresponding Kaplan–Meier survival analysis by the log-rank statistic, multivariate Cox proportional hazards modeling, and c-index calculation were conducted using the programming language R (Version 2.15.1 with the “risksetROC” package. Results: The multivariate analysis demonstrated that the risk factor of the ten-gene expression signature can significantly improve the discriminatory

  16. Metagenomic Analyses Reveal That Energy Transfer Gene Abundances Can Predict the Syntrophic Potential of Environmental Microbial Communities

    Directory of Open Access Journals (Sweden)

    Lisa Oberding

    2016-01-01

    Full Text Available Hydrocarbon compounds can be biodegraded by anaerobic microorganisms to form methane through an energetically interdependent metabolic process known as syntrophy. The microorganisms that perform this process as well as the energy transfer mechanisms involved are difficult to study and thus are still poorly understood, especially on an environmental scale. Here, metagenomic data was analyzed for specific clusters of orthologous groups (COGs related to key energy transfer genes thus far identified in syntrophic bacteria, and principal component analysis was used in order to determine whether potentially syntrophic environments could be distinguished using these syntroph related COGs as opposed to universally present COGs. We found that COGs related to hydrogenase and formate dehydrogenase genes were able to distinguish known syntrophic consortia and environments with the potential for syntrophy from non-syntrophic environments, indicating that these COGs could be used as a tool to identify syntrophic hydrocarbon biodegrading environments using metagenomic data.

  17. Obesity risk prediction among women of Upper Egypt: The impact of serum vaspin and vaspin rs2236242 gene polymorphism.

    Science.gov (United States)

    Abdel Ghany, Soad M; Sayed, Ayat A; El-Deek, Sahar E M; ElBadre, Hala M; Dahpy, Marwa A; Saleh, Medhat A; Sharaf El-Deen, Hanan; Mustafa, Mohamed H

    2017-08-30

    Vaspin is an adipokine that is potentially linking obesity, insulin resistance, metabolic syndrome and type-2 diabetes. The present study aimed to investigate the impact of vaspin rs2236242 gene polymorphism on the risk of obesity, diabetes, their metabolic traits, and serum vaspin levels in a sample of Upper Egyptian women. A total of 224 subjects, 112 obese (62 non diabetics, 50 diabetics) and 112 controls were included in this case control study. Vaspin gene rs2236242 polymorphism was performed using tetra-amplification refractory mutation system-polymerase chain reaction (T-ARMS-PCR) and serum vaspin levels were estimated by ELISA. The minor (A) allele of vaspin rs2236242 gene polymorphism was significantly lower in obese (30.8%) than controls (43.7%) (P=0.005). The protective effect was evident in dominant and recessive inheritance models (TT vs TA+AA, P=0.004 and TT+TA vs AA, P=0.036). After adjusting genotypes for diabetes there were no significant association between vaspin rs2236242 gene polymorphism and obesity but significant association was maintained in the obese diabetics. Vaspin serum levels were found to be lower in minor protective (AA) genotype carriers than the other two genotypes (Pobese diabetics and non-diabetics than controls (Pobesity and diabetes but this relation is largely ascribed to its effect on insulin resistance. The serum vaspin concentration was lower in minor protective allele carriers. To the best of our knowledge, this is the first study of vaspin SNP in Upper Egyptian women. The entire understanding of vaspin intimate mechanistic action might enable the development of novel etiology-based treatment strategies for obesity, the complex genetic trait. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Building and validating a prediction model for paediatric type 1 diabetes risk using next generation targeted sequencing of class II HLA genes.

    Science.gov (United States)

    Zhao, Lue Ping; Carlsson, Annelie; Larsson, Helena Elding; Forsander, Gun; Ivarsson, Sten A; Kockum, Ingrid; Ludvigsson, Johnny; Marcus, Claude; Persson, Martina; Samuelsson, Ulf; Örtqvist, Eva; Pyo, Chul-Woo; Bolouri, Hamid; Zhao, Michael; Nelson, Wyatt C; Geraghty, Daniel E; Lernmark, Åke

    2017-11-01

    It is of interest to predict possible lifetime risk of type 1 diabetes (T1D) in young children for recruiting high-risk subjects into longitudinal studies of effective prevention strategies. Utilizing a case-control study in Sweden, we applied a recently developed next generation targeted sequencing technology to genotype class II genes and applied an object-oriented regression to build and validate a prediction model for T1D. In the training set, estimated risk scores were significantly different between patients and controls (P = 8.12 × 10 -92 ), and the area under the curve (AUC) from the receiver operating characteristic (ROC) analysis was 0.917. Using the validation data set, we validated the result with AUC of 0.886. Combining both training and validation data resulted in a predictive model with AUC of 0.903. Further, we performed a "biological validation" by correlating risk scores with 6 islet autoantibodies, and found that the risk score was significantly correlated with IA-2A (Z-score = 3.628, P < 0.001). When applying this prediction model to the Swedish population, where the lifetime T1D risk ranges from 0.5% to 2%, we anticipate identifying approximately 20 000 high-risk subjects after testing all newborns, and this calculation would identify approximately 80% of all patients expected to develop T1D in their lifetime. Through both empirical and biological validation, we have established a prediction model for estimating lifetime T1D risk, using class II HLA. This prediction model should prove useful for future investigations to identify high-risk subjects for prevention research in high-risk populations. Copyright © 2017 John Wiley & Sons, Ltd.

  19. ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences

    Directory of Open Access Journals (Sweden)

    Pesole Graziano

    2005-10-01

    Full Text Available Abstract Background: Currently available methods to predict splice sites are mainly based on the independent and progressive alignment of transcript data (mostly ESTs to the genomic sequence. Apart from often being computationally expensive, this approach is vulnerable to several problems – hence the need to develop novel strategies. Results: We propose a method, based on a novel multiple genome-EST alignment algorithm, for the detection of splice sites. To avoid limitations of splice sites prediction (mainly, over-predictions due to independent single EST alignments to the genomic sequence our approach performs a multiple alignment of transcript data to the genomic sequence based on the combined analysis of all available data. We recast the problem of predicting constitutive and alternative splicing as an optimization problem, where the optimal multiple transcript alignment minimizes the number of exons and hence of splice site observations. We have implemented a splice site predictor based on this algorithm in the software tool ASPIC (Alternative Splicing PredICtion. It is distinguished from other methods based on BLAST-like tools by the incorporation of entirely new ad hoc procedures for accurate and computationally efficient transcript alignment and adopts dynamic programming for the refinement of intron boundaries. ASPIC also provides the minimal set of non-mergeable transcript isoforms compatible with the detected splicing events. The ASPIC web resource is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility. Conclusion: Extensive bench marking shows that ASPIC outperforms other existing methods in the detection of novel splicing isoforms and in the minimization of over-predictions. ASPIC also requires a lower computation time for processing a single gene and an EST cluster. The ASPIC web resource is available at http://aspic.algo.disco.unimib.it/aspic-devel/.

  20. Fine mapping and candidate gene prediction of a pleiotropic quantitative trait locus for yield-related trait in Zea mays.

    Directory of Open Access Journals (Sweden)

    Ruixiang Liu

    Full Text Available The yield of maize grain is a highly complex quantitative trait that is controlled by multiple quantitative trait loci (QTLs with small effects, and is frequently influenced by multiple genetic and environmental factors. Thus, it is challenging to clone a QTL for grain yield in the maize genome. Previously, we identified a major QTL, qKNPR6, for kernel number per row (KNPR across multiple environments, and developed two nearly isogenic lines, SL57-6 and Ye478, which differ only in the allelic constitution at the short segment harboring the QTL. Recently, qKNPR6 was re-evaluated in segregating populations derived from SL57-6×Ye478, and was narrowed down to a 2.8 cM interval, which explained 56.3% of the phenotypic variance of KNPR in 201 F(2∶3 families. The QTL simultaneously affected ear length, kernel weight and grain yield. Furthermore, a large F(2 population with more than 12,800 plants, 191 recombinant chromosomes and 10 overlapping recombinant lines placed qKNPR6 into a 0.91 cM interval corresponding to 198Kb of the B73 reference genome. In this region, six genes with expressed sequence tag (EST evidence were annotated. The expression pattern and DNA diversity of the six genes were assayed in Ye478 and SL57-6. The possible candidate gene and the pathway involved in inflorescence development were discussed.

  1. Angiotensin-converting enzyme gene polymorphism in arrhythmogenic right ventricular dysplasia: is DD genotype helpful in predicting syncope risk?

    Science.gov (United States)

    Ozben, Beste; Altun, Ibrahim; Sabri Hancer, Veysel; Bilge, Ahmet Kaya; Tanrikulu, Azra Meryem; Diz-Kucukkaya, Reyhan; Fak, Ali Serdar; Yilmaz, Ercument; Adalet, Kamil

    2008-12-01

    Arrhythmogenic right ventricular dysplasia (ARVD) is a heritable disorder characterised by fibrofatty replacement of right ventricular myocytes and increased risk of ventricular arrhythmias and sudden cardiac death. Angiotensin-converting enzyme (ACE) gene insertion/deletion (I/D) polymorphism affects myocardial ACE levels. DD genotype favours myocardial fibrosis and is associated with malignant ventricular tachycardia. The aim of this study was to explore ACE gene polymorphism in ARVD patients. Twenty-nine patients with ARVD and 24 controls were included. All ARVD patients had documented sustained ventricular tachycardia. Thirteen patients had syncopal episodes. Six patients were resuscitated from sudden cardiac death. ACE gene polymorphism was identified by polymerase chain reaction technique. There was no significant difference in DD genotype frequency between ARVD patients and controls (44.8% vs. 45.8%, p=0.94). However, DD genotype frequency was significantly higher in ARVD patients with syncopal episodes compared to those without syncope (69.2% vs. 25.0%, p=0.017, odds ratio:6.750, 95% confidence interval: 1.318-34.565). DD genotype was detected in higher frequency also in patients with a family history of sudden cardiac death (66.7% vs. 39.1%,p=0.36). High prevalence of DD genotype in ARVD patients with syncope suggests that ACE I/D polymorphism might be useful in identifying high-risk patients for syncope.

  2. Structural implications of mutations in the pea SYM8 symbiosis gene, the DMI1 ortholog, encoding a predicted ion channel

    DEFF Research Database (Denmark)

    Edwards, Anne; Heckmann, Anne Birgitte Lau; Yousafzai, Faridoon

    2007-01-01

    the aspartate to valine and identified a missense mutation (changing alanine to valine adjacent to the aspartate residues) in this predicted filter region; both mutations caused a loss of function. We also identified a loss-of-function missense mutation (changing arginine to isoleucine) in a domain proposed...

  3. High-intensity sweetener consumption and gut microbiome content and predicted gene function in a cross-sectional study of adults in the United States.

    Science.gov (United States)

    Frankenfeld, Cara L; Sikaroodi, Masoumeh; Lamb, Evan; Shoemaker, Sarah; Gillevet, Patrick M

    2015-10-01

    To evaluate gut microbiome in relation to recent high-intensity sweetener consumption in healthy adults. Thirty-one adults completed a four-day food record and provided a fecal sample on the fifth day. Bacterial community in the samples was analyzed using multitag pyrosequencing. Across consumers and nonconsumers of aspartame and acesulfame-K, bacterial abundance was compared using nonparametric statistics, and bacterial diversity was compared using UniFrac analysis. Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt) was used to predict mean relative abundance of gene function. There were seven aspartame consumers and seven acesulfame-K consumers. Three individuals overlapped groups, consuming both sweeteners. There were no differences in median bacterial abundance (class or order) across consumers and nonconsumers of either sweetener. Overall bacterial diversity was different across nonconsumers and consumers of aspartame (P Bacterial abundance profiles and predicted gene function were not associated with recent dietary high-intensity sweetener consumption. However, bacterial diversity differed across consumers and nonconsumers. Given the increasing consumption of sweeteners and the role that the microbiome may have in chronic disease outcomes, work in further studies is warranted. Copyright © 2015 Elsevier Inc. All rights reserved.

  4. A prospective investigation of predictive and modifiable risk factors for breast cancer in unaffected BRCA1 and BRCA2 gene carriers

    International Nuclear Information System (INIS)

    Guinan, Emer M; Hussey, Juliette; McGarrigle, Sarah A; Healy, Laura A; O’Sullivan, Jacintha N; Bennett, Kathleen; Connolly, Elizabeth M

    2013-01-01

    Breast cancer is the most common female cancer worldwide. The lifetime risk of a woman being diagnosed with breast cancer is approximately 12.5%. For women who carry the deleterious mutation in either of the BRCA genes, BRCA1 or BRCA2, the risk of developing breast or ovarian cancer is significantly increased. In recent years there has been increased penetrance of BRCA1 and BRCA2 associated breast cancer, prompting investigation into the role of modifiable risk factors in this group. Previous investigations into this topic have relied on participants recalling lifetime weight changes and subjective methods of recording physical activity. The influence of obesity-related biomarkers, which may explain the link between obesity, physical activity and breast cancer risk, has not been investigated prospectively in this group. This paper describes the design of a prospective cohort study investigating the role of predictive and modifiable risk factors for breast cancer in unaffected BRCA1 and BRCA2 gene mutation carriers. Participants will be recruited from breast cancer family risk clinics and genetics clinics. Lifestyle risk factors that will be investigated will include body composition, metabolic syndrome and its components, physical activity and dietary intake. PBMC telomere length will be measured as a potential predictor of breast cancer occurrence. Measurements will be completed on entry to the study and repeated at two years and five years. Participants will also be followed annually by questionnaire to track changes in risk factor status and to record cancer occurrence. Data will be analysed using multiple regression models. The study has an accrual target of 352 participants. The results from this study will provide valuable information regarding the role of modifiable lifestyle risk factors for breast cancer in women with a deleterious mutation in the BRCA gene. Additionally, the study will attempt to identify potential blood biomarkers which may be predictive

  5. A luciferase reporter gene assay and aryl hydrocarbon receptor 1 genotype predict the LD50 of polychlorinated biphenyls in avian species

    International Nuclear Information System (INIS)

    Manning, Gillian E.; Farmahin, Reza; Crump, Doug; Jones, Stephanie P.; Klein, Jeff; Konstantinov, Alex; Potter, Dave; Kennedy, Sean W.

    2012-01-01

    Birds differ in sensitivity to the embryotoxic effects of polychlorinated biphenyls (PCBs), which complicates environmental risk assessments for these chemicals. Recent research has shown that the identities of amino acid residues 324 and 380 in the avian aryl hydrocarbon receptor 1 (AHR1) ligand binding domain (LBD) are primarily responsible for differences in avian species sensitivity to selected dibenzo-p-dioxins and furans. A luciferase reporter gene (LRG) assay was developed in our laboratory to measure AHR1-mediated induction of a cytochrome P450 1A5 reporter gene in COS-7 cells transfected with different avian AHR1 constructs. In the present study, the LRG assay was used to measure the concentration-dependent effects of 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), and PCBs 126, 77, 105 and 118 on luciferase activity in COS-7 cells transfected with AHR1 constructs representative of 86 avian species in order to predict their sensitivity to PCB-induced embryolethality and the relative potency of PCBs in these species. The results of the LRG assay indicate that the identity of amino acid residues 324 and 380 in the AHR1 LBD are the major determinants of avian species sensitivity to PCBs. The relative potency of PCBs did not differ greatly among AHR1 constructs. Luciferase activity was significantly correlated with embryolethality data obtained from the literature (R 2 ≥ 0.87, p < 0.0001). Thus, the LRG assay in combination with the knowledge of a species' AHR1 LBD sequence can be used to predict PCB-induced embryolethality in potentially any avian species of interest without the use of lethal methods on a large number of individuals. -- Highlights: ► PCB embryolethality in birds can be predicted from a species' AHR1 genotype. ► The reporter gene assay is useful for predicting species sensitivity to PCBs. ► The relative potency of PCBs does not appear to differ between AHR1 genotypes. ► Contamination of PCB 105 and PCB 118 did not affect their relative

  6. A luciferase reporter gene assay and aryl hydrocarbon receptor 1 genotype predict the LD{sub 50} of polychlorinated biphenyls in avian species

    Energy Technology Data Exchange (ETDEWEB)

    Manning, Gillian E., E-mail: gmann017@uottawa.ca [Centre for Advanced Research in Environmental Genomics, Department of Biology, University of Ottawa, Ottawa, ON, Canada K1N 6N5 (Canada); Environment Canada, National Wildlife Research Centre, Ottawa, ON, Canada K1A 0H3 (Canada); Farmahin, Reza, E-mail: mfarm070@uottawa.ca [Centre for Advanced Research in Environmental Genomics, Department of Biology, University of Ottawa, Ottawa, ON, Canada K1N 6N5 (Canada); Environment Canada, National Wildlife Research Centre, Ottawa, ON, Canada K1A 0H3 (Canada); Crump, Doug, E-mail: doug.crump@ec.gc.ca [Environment Canada, National Wildlife Research Centre, Ottawa, ON, Canada K1A 0H3 (Canada); Jones, Stephanie P., E-mail: stephanie.jones@ec.gc.ca [Environment Canada, National Wildlife Research Centre, Ottawa, ON, Canada K1A 0H3 (Canada); Klein, Jeff, E-mail: jeffery@well-labs.com [Wellington Laboratories Inc., Research Division, Guelph, ON, Canada N1G 3M5 (Canada); Konstantinov, Alex, E-mail: alex@well-labs.com [Wellington Laboratories Inc., Research Division, Guelph, ON, Canada N1G 3M5 (Canada); Potter, Dave, E-mail: dpotter@well-labs.com [Wellington Laboratories Inc., Research Division, Guelph, ON, Canada N1G 3M5 (Canada); Kennedy, Sean W., E-mail: sean.kennedy@ec.gc.ca [Centre for Advanced Research in Environmental Genomics, Department of Biology, University of Ottawa, Ottawa, ON, Canada K1N 6N5 (Canada); Environment Canada, National Wildlife Research Centre, Ottawa, ON, Canada K1A 0H3 (Canada)

    2012-09-15

    Birds differ in sensitivity to the embryotoxic effects of polychlorinated biphenyls (PCBs), which complicates environmental risk assessments for these chemicals. Recent research has shown that the identities of amino a