WorldWideScience

Sample records for based gene discovery

  1. Gene set-based module discovery in the breast cancer transcriptome

    Directory of Open Access Journals (Sweden)

    Zhang Michael Q

    2009-02-01

    Full Text Available Abstract Background Although microarray-based studies have revealed global view of gene expression in cancer cells, we still have little knowledge about regulatory mechanisms underlying the transcriptome. Several computational methods applied to yeast data have recently succeeded in identifying expression modules, which is defined as co-expressed gene sets under common regulatory mechanisms. However, such module discovery methods are not applied cancer transcriptome data. Results In order to decode oncogenic regulatory programs in cancer cells, we developed a novel module discovery method termed EEM by extending a previously reported module discovery method, and applied it to breast cancer expression data. Starting from seed gene sets prepared based on cis-regulatory elements, ChIP-chip data, and gene locus information, EEM identified 10 principal expression modules in breast cancer based on their expression coherence. Moreover, EEM depicted their activity profiles, which predict regulatory programs in each subtypes of breast tumors. For example, our analysis revealed that the expression module regulated by the Polycomb repressive complex 2 (PRC2 is downregulated in triple negative breast cancers, suggesting similarity of transcriptional programs between stem cells and aggressive breast cancer cells. We also found that the activity of the PRC2 expression module is negatively correlated to the expression of EZH2, a component of PRC2 which belongs to the E2F expression module. E2F-driven EZH2 overexpression may be responsible for the repression of the PRC2 expression modules in triple negative tumors. Furthermore, our network analysis predicts regulatory circuits in breast cancer cells. Conclusion These results demonstrate that the gene set-based module discovery approach is a powerful tool to decode regulatory programs in cancer cells.

  2. Canonical correlation analysis for gene-based pleiotropy discovery.

    Directory of Open Access Journals (Sweden)

    Jose A Seoane

    2014-10-01

    Full Text Available Genome-wide association studies have identified a wealth of genetic variants involved in complex traits and multifactorial diseases. There is now considerable interest in testing variants for association with multiple phenotypes (pleiotropy and for testing multiple variants for association with a single phenotype (gene-based association tests. Such approaches can increase statistical power by combining evidence for association over multiple phenotypes or genetic variants respectively. Canonical Correlation Analysis (CCA measures the correlation between two sets of multidimensional variables, and thus offers the potential to combine these two approaches. To apply CCA, we must restrict the number of attributes relative to the number of samples. Hence we consider modules of genetic variation that can comprise a gene, a pathway or another biologically relevant grouping, and/or a set of phenotypes. In order to do this, we use an attribute selection strategy based on a binary genetic algorithm. Applied to a UK-based prospective cohort study of 4286 women (the British Women's Heart and Health Study, we find improved statistical power in the detection of previously reported genetic associations, and identify a number of novel pleiotropic associations between genetic variants and phenotypes. New discoveries include gene-based association of NSF with triglyceride levels and several genes (ACSM3, ERI2, IL18RAP, IL23RAP and NRG1 with left ventricular hypertrophy phenotypes. In multiple-phenotype analyses we find association of NRG1 with left ventricular hypertrophy phenotypes, fibrinogen and urea and pleiotropic relationships of F7 and F10 with Factor VII, Factor IX and cholesterol levels.

  3. ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis

    Directory of Open Access Journals (Sweden)

    Saurav Mallik

    2017-12-01

    Full Text Available For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures—weighted rank-based Jaccard and Cosine measures—and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm—RANWAR—was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.

  4. ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.

    Science.gov (United States)

    Mallik, Saurav; Zhao, Zhongming

    2017-12-28

    For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures-weighted rank-based Jaccard and Cosine measures-and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm-RANWAR-was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.

  5. Species-independent MicroRNA Gene Discovery

    KAUST Repository

    Kamanu, Timothy K.

    2012-12-01

    MicroRNA (miRNA) are a class of small endogenous non-coding RNA that are mainly negative transcriptional and post-transcriptional regulators in both plants and animals. Recent studies have shown that miRNA are involved in different types of cancer and other incurable diseases such as autism and Alzheimer’s. Functional miRNAs are excised from hairpin-like sequences that are known as miRNA genes. There are about 21,000 known miRNA genes, most of which have been determined using experimental methods. miRNA genes are classified into different groups (miRNA families). This study reports about 19,000 unknown miRNA genes in nine species whereby approximately 15,300 predictions were computationally validated to contain at least one experimentally verified functional miRNA product. The predictions are based on a novel computational strategy which relies on miRNA family groupings and exploits the physics and geometry of miRNA genes to unveil the hidden palindromic signals and symmetries in miRNA gene sequences. Unlike conventional computational miRNA gene discovery methods, the algorithm developed here is species-independent: it allows prediction at higher accuracy and resolution from arbitrary RNA/DNA sequences in any species and thus enables examination of repeat-prone genomic regions which are thought to be non-informative or ’junk’ sequences. The information non-redundancy of uni-directional RNA sequences compared to information redundancy of bi-directional DNA is demonstrated, a fact that is overlooked by most pattern discovery algorithms. A novel method for computing upstream and downstream miRNA gene boundaries based on mathematical/statistical functions is suggested, as well as cutoffs for annotation of miRNA genes in different miRNA families. Another tool is proposed to allow hypotheses generation and visualization of data matrices, intra- and inter-species chromosomal distribution of miRNA genes or miRNA families. Our results indicate that: miRNA and mi

  6. Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease.

    Science.gov (United States)

    Azuaje, Francisco; Zheng, Huiru; Camargo, Anyela; Wang, Haiying

    2011-08-01

    The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms. Copyright © 2011 Elsevier Inc. All rights reserved.

  7. GWATCH: a web platform for automated gene association discovery analysis

    Science.gov (United States)

    2014-01-01

    Background As genome-wide sequence analyses for complex human disease determinants are expanding, it is increasingly necessary to develop strategies to promote discovery and validation of potential disease-gene associations. Findings Here we present a dynamic web-based platform – GWATCH – that automates and facilitates four steps in genetic epidemiological discovery: 1) Rapid gene association search and discovery analysis of large genome-wide datasets; 2) Expanded visual display of gene associations for genome-wide variants (SNPs, indels, CNVs), including Manhattan plots, 2D and 3D snapshots of any gene region, and a dynamic genome browser illustrating gene association chromosomal regions; 3) Real-time validation/replication of candidate or putative genes suggested from other sources, limiting Bonferroni genome-wide association study (GWAS) penalties; 4) Open data release and sharing by eliminating privacy constraints (The National Human Genome Research Institute (NHGRI) Institutional Review Board (IRB), informed consent, The Health Insurance Portability and Accountability Act (HIPAA) of 1996 etc.) on unabridged results, which allows for open access comparative and meta-analysis. Conclusions GWATCH is suitable for both GWAS and whole genome sequence association datasets. We illustrate the utility of GWATCH with three large genome-wide association studies for HIV-AIDS resistance genes screened in large multicenter cohorts; however, association datasets from any study can be uploaded and analyzed by GWATCH. PMID:25374661

  8. SSHscreen and SSHdb, generic software for microarray based gene discovery: application to the stress response in cowpea

    Directory of Open Access Journals (Sweden)

    Oelofse Dean

    2010-04-01

    Full Text Available Abstract Background Suppression subtractive hybridization is a popular technique for gene discovery from non-model organisms without an annotated genome sequence, such as cowpea (Vigna unguiculata (L. Walp. We aimed to use this method to enrich for genes expressed during drought stress in a drought tolerant cowpea line. However, current methods were inefficient in screening libraries and management of the sequence data, and thus there was a need to develop software tools to facilitate the process. Results Forward and reverse cDNA libraries enriched for cowpea drought response genes were screened on microarrays, and the R software package SSHscreen 2.0.1 was developed (i to normalize the data effectively using spike-in control spot normalization, and (ii to select clones for sequencing based on the calculation of enrichment ratios with associated statistics. Enrichment ratio 3 values for each clone showed that 62% of the forward library and 34% of the reverse library clones were significantly differentially expressed by drought stress (adjusted p value 88% of the clones in both libraries were derived from rare transcripts in the original tester samples, thus supporting the notion that suppression subtractive hybridization enriches for rare transcripts. A set of 118 clones were chosen for sequencing, and drought-induced cowpea genes were identified, the most interesting encoding a late embryogenesis abundant Lea5 protein, a glutathione S-transferase, a thaumatin, a universal stress protein, and a wound induced protein. A lipid transfer protein and several components of photosynthesis were down-regulated by the drought stress. Reverse transcriptase quantitative PCR confirmed the enrichment ratio values for the selected cowpea genes. SSHdb, a web-accessible database, was developed to manage the clone sequences and combine the SSHscreen data with sequence annotations derived from BLAST and Blast2GO. The self-BLAST function within SSHdb grouped

  9. A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.

    Directory of Open Access Journals (Sweden)

    Borui Pi

    Full Text Available Secondary metabolites (SMs produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.

  10. A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus

    Science.gov (United States)

    Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong

    2015-01-01

    Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic. PMID:25706180

  11. Developing integrated crop knowledge networks to advance candidate gene discovery.

    Science.gov (United States)

    Hassani-Pak, Keywan; Castellote, Martin; Esch, Maria; Hindle, Matthew; Lysenko, Artem; Taubert, Jan; Rawlings, Christopher

    2016-12-01

    The chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement.

  12. Genomics-Based Discovery of Plant Genes for Synthetic Biology of Terpenoid Fragrances: A Case Study in Sandalwood oil Biosynthesis.

    Science.gov (United States)

    Celedon, J M; Bohlmann, J

    2016-01-01

    Terpenoid fragrances are powerful mediators of ecological interactions in nature and have a long history of traditional and modern industrial applications. Plants produce a great diversity of fragrant terpenoid metabolites, which make them a superb source of biosynthetic genes and enzymes. Advances in fragrance gene discovery have enabled new approaches in synthetic biology of high-value speciality molecules toward applications in the fragrance and flavor, food and beverage, cosmetics, and other industries. Rapid developments in transcriptome and genome sequencing of nonmodel plant species have accelerated the discovery of fragrance biosynthetic pathways. In parallel, advances in metabolic engineering of microbial and plant systems have established platforms for synthetic biology applications of some of the thousands of plant genes that underlie fragrance diversity. While many fragrance molecules (eg, simple monoterpenes) are abundant in readily renewable plant materials, some highly valuable fragrant terpenoids (eg, santalols, ambroxides) are rare in nature and interesting targets for synthetic biology. As a representative example for genomics/transcriptomics enabled gene and enzyme discovery, we describe a strategy used successfully for elucidation of a complete fragrance biosynthetic pathway in sandalwood (Santalum album) and its reconstruction in yeast (Saccharomyces cerevisiae). We address questions related to the discovery of specific genes within large gene families and recovery of rare gene transcripts that are selectively expressed in recalcitrant tissues. To substantiate the validity of the approaches, we describe the combination of methods used in the gene and enzyme discovery of a cytochrome P450 in the fragrant heartwood of tropical sandalwood, responsible for the fragrance defining, final step in the biosynthesis of (Z)-santalols. © 2016 Elsevier Inc. All rights reserved.

  13. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships.

    Science.gov (United States)

    Seok, Junhee; Kaushal, Amit; Davis, Ronald W; Xiao, Wenzhong

    2010-01-18

    The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.

  14. iSyTE 2.0: a database for expression-based gene discovery in the eye

    Science.gov (United States)

    Kakrana, Atul; Yang, Andrian; Anand, Deepti; Djordjevic, Djordje; Ramachandruni, Deepti; Singh, Abhyudai; Huang, Hongzhan

    2018-01-01

    Abstract Although successful in identifying new cataract-linked genes, the previous version of the database iSyTE (integrated Systems Tool for Eye gene discovery) was based on expression information on just three mouse lens stages and was functionally limited to visualization by only UCSC-Genome Browser tracks. To increase its efficacy, here we provide an enhanced iSyTE version 2.0 (URL: http://research.bioinformatics.udel.edu/iSyTE) based on well-curated, comprehensive genome-level lens expression data as a one-stop portal for the effective visualization and analysis of candidate genes in lens development and disease. iSyTE 2.0 includes all publicly available lens Affymetrix and Illumina microarray datasets representing a broad range of embryonic and postnatal stages from wild-type and specific gene-perturbation mouse mutants with eye defects. Further, we developed a new user-friendly web interface for direct access and cogent visualization of the curated expression data, which supports convenient searches and a range of downstream analyses. The utility of these new iSyTE 2.0 features is illustrated through examples of established genes associated with lens development and pathobiology, which serve as tutorials for its application by the end-user. iSyTE 2.0 will facilitate the prioritization of eye development and disease-linked candidate genes in studies involving transcriptomics or next-generation sequencing data, linkage analysis and GWAS approaches. PMID:29036527

  15. A brief history of Alzheimer's disease gene discovery.

    Science.gov (United States)

    Tanzi, Rudolph E

    2013-01-01

    The rich and colorful history of gene discovery in Alzheimer's disease (AD) over the past three decades is as complex and heterogeneous as the disease, itself. Twin and family studies indicate that genetic factors are estimated to play a role in at least 80% of AD cases. The inheritance of AD exhibits a dichotomous pattern. On one hand, rare mutations inAPP, PSEN1, and PSEN2 are fully penetrant for early-onset (95%) late-onset AD. These four genes account for 30-50% of the inheritability of AD. Genome-wide association studies have recently led to the identification of additional highly confirmed AD candidate genes. Here, I review the past, present, and future of attempts to elucidate the complex and heterogeneous genetic underpinnings of AD along with some of the unique events that made these discoveries possible.

  16. Speeding disease gene discovery by sequence based candidate prioritization

    Directory of Open Access Journals (Sweden)

    Porteous David J

    2005-03-01

    Full Text Available Abstract Background Regions of interest identified through genetic linkage studies regularly exceed 30 centimorgans in size and can contain hundreds of genes. Traditionally this number is reduced by matching functional annotation to knowledge of the disease or phenotype in question. However, here we show that disease genes share patterns of sequence-based features that can provide a good basis for automatic prioritization of candidates by machine learning. Results We examined a variety of sequence-based features and found that for many of them there are significant differences between the sets of genes known to be involved in human hereditary disease and those not known to be involved in disease. We have created an automatic classifier called PROSPECTR based on those features using the alternating decision tree algorithm which ranks genes in the order of likelihood of involvement in disease. On average, PROSPECTR enriches lists for disease genes two-fold 77% of the time, five-fold 37% of the time and twenty-fold 11% of the time. Conclusion PROSPECTR is a simple and effective way to identify genes involved in Mendelian and oligogenic disorders. It performs markedly better than the single existing sequence-based classifier on novel data. PROSPECTR could save investigators looking at large regions of interest time and effort by prioritizing positional candidate genes for mutation detection and case-control association studies.

  17. Alternative Polyadenylation Patterns for Novel Gene Discovery and Classification in Cancer

    Directory of Open Access Journals (Sweden)

    Oguzhan Begik

    2017-07-01

    Full Text Available Certain aspects of diagnosis, prognosis, and treatment of cancer patients are still important challenges to be addressed. Therefore, we propose a pipeline to uncover patterns of alternative polyadenylation (APA, a hidden complexity in cancer transcriptomes, to further accelerate efforts to discover novel cancer genes and pathways. Here, we analyzed expression data for 1045 cancer patients and found a significant shift in usage of poly(A signals in common tumor types (breast, colon, lung, prostate, gastric, and ovarian compared to normal tissues. Using machine-learning techniques, we further defined specific subsets of APA events to efficiently classify cancer types. Furthermore, APA patterns were associated with altered protein levels in patients, revealed by antibody-based profiling data, suggesting functional significance. Overall, our study offers a computational approach for use of APA in novel gene discovery and classification in common tumor types, with important implications in basic research, biomarker discovery, and precision medicine approaches.

  18. Biomarker discovery for colon cancer using a 761 gene RT-PCR assay

    Directory of Open Access Journals (Sweden)

    Hackett James R

    2007-08-01

    Full Text Available Abstract Background Reverse transcription PCR (RT-PCR is widely recognized to be the gold standard method for quantifying gene expression. Studies using RT-PCR technology as a discovery tool have historically been limited to relatively small gene sets compared to other gene expression platforms such as microarrays. We have recently shown that TaqMan® RT-PCR can be scaled up to profile expression for 192 genes in fixed paraffin-embedded (FPE clinical study tumor specimens. This technology has also been used to develop and commercialize a widely used clinical test for breast cancer prognosis and prediction, the Onco typeDX™ assay. A similar need exists in colon cancer for a test that provides information on the likelihood of disease recurrence in colon cancer (prognosis and the likelihood of tumor response to standard chemotherapy regimens (prediction. We have now scaled our RT-PCR assay to efficiently screen 761 biomarkers across hundreds of patient samples and applied this process to biomarker discovery in colon cancer. This screening strategy remains attractive due to the inherent advantages of maintaining platform consistency from discovery through clinical application. Results RNA was extracted from formalin fixed paraffin embedded (FPE tissue, as old as 28 years, from 354 patients enrolled in NSABP C-01 and C-02 colon cancer studies. Multiplexed reverse transcription reactions were performed using a gene specific primer pool containing 761 unique primers. PCR was performed as independent TaqMan® reactions for each candidate gene. Hierarchal clustering demonstrates that genes expected to co-express form obvious, distinct and in certain cases very tightly correlated clusters, validating the reliability of this technical approach to biomarker discovery. Conclusion We have developed a high throughput, quantitatively precise multi-analyte gene expression platform for biomarker discovery that approaches low density DNA arrays in numbers of

  19. A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data

    Directory of Open Access Journals (Sweden)

    Li Min

    2012-03-01

    Full Text Available Abstract Background Identification of essential proteins is always a challenging task since it requires experimental approaches that are time-consuming and laborious. With the advances in high throughput technologies, a large number of protein-protein interactions are available, which have produced unprecedented opportunities for detecting proteins' essentialities from the network level. There have been a series of computational approaches proposed for predicting essential proteins based on network topologies. However, the network topology-based centrality measures are very sensitive to the robustness of network. Therefore, a new robust essential protein discovery method would be of great value. Results In this paper, we propose a new centrality measure, named PeC, based on the integration of protein-protein interaction and gene expression data. The performance of PeC is validated based on the protein-protein interaction network of Saccharomyces cerevisiae. The experimental results show that the predicted precision of PeC clearly exceeds that of the other fifteen previously proposed centrality measures: Degree Centrality (DC, Betweenness Centrality (BC, Closeness Centrality (CC, Subgraph Centrality (SC, Eigenvector Centrality (EC, Information Centrality (IC, Bottle Neck (BN, Density of Maximum Neighborhood Component (DMNC, Local Average Connectivity-based method (LAC, Sum of ECC (SoECC, Range-Limited Centrality (RL, L-index (LI, Leader Rank (LR, Normalized α-Centrality (NC, and Moduland-Centrality (MC. Especially, the improvement of PeC over the classic centrality measures (BC, CC, SC, EC, and BN is more than 50% when predicting no more than 500 proteins. Conclusions We demonstrate that the integration of protein-protein interaction network and gene expression data can help improve the precision of predicting essential proteins. The new centrality measure, PeC, is an effective essential protein discovery method.

  20. Discovery of cancer common and specific driver gene sets

    Science.gov (United States)

    2017-01-01

    Abstract Cancer is known as a disease mainly caused by gene alterations. Discovery of mutated driver pathways or gene sets is becoming an important step to understand molecular mechanisms of carcinogenesis. However, systematically investigating commonalities and specificities of driver gene sets among multiple cancer types is still a great challenge, but this investigation will undoubtedly benefit deciphering cancers and will be helpful for personalized therapy and precision medicine in cancer treatment. In this study, we propose two optimization models to de novo discover common driver gene sets among multiple cancer types (ComMDP) and specific driver gene sets of one certain or multiple cancer types to other cancers (SpeMDP), respectively. We first apply ComMDP and SpeMDP to simulated data to validate their efficiency. Then, we further apply these methods to 12 cancer types from The Cancer Genome Atlas (TCGA) and obtain several biologically meaningful driver pathways. As examples, we construct a common cancer pathway model for BRCA and OV, infer a complex driver pathway model for BRCA carcinogenesis based on common driver gene sets of BRCA with eight cancer types, and investigate specific driver pathways of the liquid cancer lymphoblastic acute myeloid leukemia (LAML) versus other solid cancer types. In these processes more candidate cancer genes are also found. PMID:28168295

  1. IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites.

    Science.gov (United States)

    Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T B K; Cimermančič, Peter; Fischbach, Michael A; Ivanova, Natalia N; Markowitz, Victor M; Kyrpides, Nikos C; Pati, Amrita

    2015-07-14

    In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to

  2. Discovery of Cationic Polymers for Non-viral Gene Delivery using Combinatorial Approaches

    Science.gov (United States)

    Barua, Sutapa; Ramos, James; Potta, Thrimoorthy; Taylor, David; Huang, Huang-Chiao; Montanez, Gabriela; Rege, Kaushal

    2015-01-01

    Gene therapy is an attractive treatment option for diseases of genetic origin, including several cancers and cardiovascular diseases. While viruses are effective vectors for delivering exogenous genes to cells, concerns related to insertional mutagenesis, immunogenicity, lack of tropism, decay and high production costs necessitate the discovery of non-viral methods. Significant efforts have been focused on cationic polymers as non-viral alternatives for gene delivery. Recent studies have employed combinatorial syntheses and parallel screening methods for enhancing the efficacy of gene delivery, biocompatibility of the delivery vehicle, and overcoming cellular level barriers as they relate to polymer-mediated transgene uptake, transport, transcription, and expression. This review summarizes and discusses recent advances in combinatorial syntheses and parallel screening of cationic polymer libraries for the discovery of efficient and safe gene delivery systems. PMID:21843141

  3. Comprehensive Clinical Phenotyping and Genetic Mapping for the Discovery of Autism Susceptibility Genes

    Science.gov (United States)

    2013-03-14

    behavioral teaching strategies and best practice for teaching students with autism spectrum disorders 4.52 Learn strategies for incorporating IEP goals...AFRL-SA-WP-TR-2013-0013 Comprehensive Clinical Phenotyping and Genetic Mapping for the Discovery of Autism Susceptibility Genes...Genetic Mapping for the Discovery of Autism Susceptibility Genes 5a. CONTRACT NUMBER N/A 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER N/A 6

  4. Improving functional modules discovery by enriching interaction networks with gene profiles

    KAUST Repository

    Salem, Saeed

    2013-05-01

    Recent advances in proteomic and transcriptomic technologies resulted in the accumulation of vast amount of high-throughput data that span multiple biological processes and characteristics in different organisms. Much of the data come in the form of interaction networks and mRNA expression arrays. An important task in systems biology is functional modules discovery where the goal is to uncover well-connected sub-networks (modules). These discovered modules help to unravel the underlying mechanisms of the observed biological processes. While most of the existing module discovery methods use only the interaction data, in this work we propose, CLARM, which discovers biological modules by incorporating gene profiles data with protein-protein interaction networks. We demonstrate the effectiveness of CLARM on Yeast and Human interaction datasets, and gene expression and molecular function profiles. Experiments on these real datasets show that the CLARM approach is competitive to well established functional module discovery methods.

  5. Traditional Chinese Medicine-Based Network Pharmacology Could Lead to New Multicompound Drug Discovery

    Directory of Open Access Journals (Sweden)

    Jian Li

    2012-01-01

    Full Text Available Current strategies for drug discovery have reached a bottleneck where the paradigm is generally “one gene, one drug, one disease.” However, using holistic and systemic views, network pharmacology may be the next paradigm in drug discovery. Based on network pharmacology, a combinational drug with two or more compounds could offer beneficial synergistic effects for complex diseases. Interestingly, traditional chinese medicine (TCM has been practicing holistic views for over 3,000 years, and its distinguished feature is using herbal formulas to treat diseases based on the unique pattern classification. Though TCM herbal formulas are acknowledged as a great source for drug discovery, no drug discovery strategies compatible with the multidimensional complexities of TCM herbal formulas have been developed. In this paper, we highlighted some novel paradigms in TCM-based network pharmacology and new drug discovery. A multiple compound drug can be discovered by merging herbal formula-based pharmacological networks with TCM pattern-based disease molecular networks. Herbal formulas would be a source for multiple compound drug candidates, and the TCM pattern in the disease would be an indication for a new drug.

  6. KBERG: KnowledgeBase for Estrogen Responsive Genes

    DEFF Research Database (Denmark)

    Tang, Suisheng; Zhang, Zhuo; Tan, Sin Lam

    2007-01-01

    Estrogen has a profound impact on human physiology affecting transcription of numerous genes. To decipher functional characteristics of estrogen responsive genes, we developed KnowledgeBase for Estrogen Responsive Genes (KBERG). Genes in KBERG were derived from Estrogen Responsive Gene Database...... (ERGDB) and were analyzed from multiple aspects. We explored the possible transcription regulation mechanism by capturing highly conserved promoter motifs across orthologous genes, using promoter regions that cover the range of [-1200, +500] relative to the transcription start sites. The motif detection...... is based on ab initio discovery of common cis-elements from the orthologous gene cluster from human, mouse and rat, thus reflecting a degree of promoter sequence preservation during evolution. The identified motifs are linked to transcription factor binding sites based on the TRANSFAC database. In addition...

  7. Cross-pollination of research findings, although uncommon, may accelerate discovery of human disease genes

    Directory of Open Access Journals (Sweden)

    Duda Marlena

    2012-11-01

    Full Text Available Abstract Background Technological leaps in genome sequencing have resulted in a surge in discovery of human disease genes. These discoveries have led to increased clarity on the molecular pathology of disease and have also demonstrated considerable overlap in the genetic roots of human diseases. In light of this large genetic overlap, we tested whether cross-disease research approaches lead to faster, more impactful discoveries. Methods We leveraged several gene-disease association databases to calculate a Mutual Citation Score (MCS for 10,853 pairs of genetically related diseases to measure the frequency of cross-citation between research fields. To assess the importance of cooperative research, we computed an Individual Disease Cooperation Score (ICS and the average publication rate for each disease. Results For all disease pairs with one gene in common, we found that the degree of genetic overlap was a poor predictor of cooperation (r2=0.3198 and that the vast majority of disease pairs (89.56% never cited previous discoveries of the same gene in a different disease, irrespective of the level of genetic similarity between the diseases. A fraction (0.25% of the pairs demonstrated cross-citation in greater than 5% of their published genetic discoveries and 0.037% cross-referenced discoveries more than 10% of the time. We found strong positive correlations between ICS and publication rate (r2=0.7931, and an even stronger correlation between the publication rate and the number of cross-referenced diseases (r2=0.8585. These results suggested that cross-disease research may have the potential to yield novel discoveries at a faster pace than singular disease research. Conclusions Our findings suggest that the frequency of cross-disease study is low despite the high level of genetic similarity among many human diseases, and that collaborative methods may accelerate and increase the impact of new genetic discoveries. Until we have a better

  8. Biomarker Gene Signature Discovery Integrating Network Knowledge

    Directory of Open Access Journals (Sweden)

    Holger Fröhlich

    2012-02-01

    Full Text Available Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches.

  9. Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes.

    Science.gov (United States)

    Hassani-Pak, Keywan; Rawlings, Christopher

    2017-06-13

    Genetics and "omics" studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.

  10. Resource Discovery in Activity-Based Sensor Networks

    DEFF Research Database (Denmark)

    Bucur, Doina; Bardram, Jakob

    This paper proposes a service discovery protocol for sensor networks that is specifically tailored for use in humancentered pervasive environments. It uses the high-level concept of computational activities (as logical bundles of data and resources) to give sensors in Activity-Based Sensor Networ....... ABSN enhances the generic Extended Zone Routing Protocol with logical sensor grouping and greatly lowers network overhead during the process of discovery, while keeping discovery latency close to optimal.......This paper proposes a service discovery protocol for sensor networks that is specifically tailored for use in humancentered pervasive environments. It uses the high-level concept of computational activities (as logical bundles of data and resources) to give sensors in Activity-Based Sensor Networks...... (ABSNs) knowledge about their usage even at the network layer. ABSN redesigns classical network-level service discovery protocols to include and use this logical structuring of the network for a more practically applicable service discovery scheme. Noting that in practical settings activity-based sensor...

  11. Discovery of new candidate genes for rheumatoid arthritis through integration of genetic association data with expression pathway analysis.

    Science.gov (United States)

    Shchetynsky, Klementy; Diaz-Gallo, Lina-Marcella; Folkersen, Lasse; Hensvold, Aase Haj; Catrina, Anca Irinel; Berg, Louise; Klareskog, Lars; Padyukov, Leonid

    2017-02-02

    Here we integrate verified signals from previous genetic association studies with gene expression and pathway analysis for discovery of new candidate genes and signaling networks, relevant for rheumatoid arthritis (RA). RNA-sequencing-(RNA-seq)-based expression analysis of 377 genes from previously verified RA-associated loci was performed in blood cells from 5 newly diagnosed, non-treated patients with RA, 7 patients with treated RA and 12 healthy controls. Differentially expressed genes sharing a similar expression pattern in treated and untreated RA sub-groups were selected for pathway analysis. A set of "connector" genes derived from pathway analysis was tested for differential expression in the initial discovery cohort and validated in blood cells from 73 patients with RA and in 35 healthy controls. There were 11 qualifying genes selected for pathway analysis and these were grouped into two evidence-based functional networks, containing 29 and 27 additional connector molecules. The expression of genes, corresponding to connector molecules was then tested in the initial RNA-seq data. Differences in the expression of ERBB2, TP53 and THOP1 were similar in both treated and non-treated patients with RA and an additional nine genes were differentially expressed in at least one group of patients compared to healthy controls. The ERBB2, TP53. THOP1 expression profile was successfully replicated in RNA-seq data from peripheral blood mononuclear cells from healthy controls and non-treated patients with RA, in an independent collection of samples. Integration of RNA-seq data with findings from association studies, and consequent pathway analysis implicate new candidate genes, ERBB2, TP53 and THOP1 in the pathogenesis of RA.

  12. Automated discovery of functional generality of human gene expression programs.

    Directory of Open Access Journals (Sweden)

    Georg K Gerber

    2007-08-01

    Full Text Available An important research problem in computational biology is the identification of expression programs, sets of co-expressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-kappaB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and neurotransmitter receptors. We believe the discovered map of expression programs involved in the response to infection will be useful for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal

  13. Solution NMR Spectroscopy in Target-Based Drug Discovery.

    Science.gov (United States)

    Li, Yan; Kang, Congbao

    2017-08-23

    Solution NMR spectroscopy is a powerful tool to study protein structures and dynamics under physiological conditions. This technique is particularly useful in target-based drug discovery projects as it provides protein-ligand binding information in solution. Accumulated studies have shown that NMR will play more and more important roles in multiple steps of the drug discovery process. In a fragment-based drug discovery process, ligand-observed and protein-observed NMR spectroscopy can be applied to screen fragments with low binding affinities. The screened fragments can be further optimized into drug-like molecules. In combination with other biophysical techniques, NMR will guide structure-based drug discovery. In this review, we describe the possible roles of NMR spectroscopy in drug discovery. We also illustrate the challenges encountered in the drug discovery process. We include several examples demonstrating the roles of NMR in target-based drug discoveries such as hit identification, ranking ligand binding affinities, and mapping the ligand binding site. We also speculate the possible roles of NMR in target engagement based on recent processes in in-cell NMR spectroscopy.

  14. Functional principles of registry-based service discovery

    NARCIS (Netherlands)

    Sundramoorthy, V.; Tan, C.; Hartel, P.H.; Hartog, den J.I.; Scholten, J.

    2005-01-01

    As Service Discovery Protocols (SDP) are becoming increasingly important for ubiquitous computing, they must behave according to predefined principles. We present the functional Principles of Service Discovery for robust, registry-based service discovery. A methodology to guarantee adherence to

  15. Independent Gene Discovery and Testing

    Science.gov (United States)

    Palsule, Vrushalee; Coric, Dijana; Delancy, Russell; Dunham, Heather; Melancon, Caleb; Thompson, Dennis; Toms, Jamie; White, Ashley; Shultz, Jeffry

    2010-01-01

    A clear understanding of basic gene structure is critical when teaching molecular genetics, the central dogma and the biological sciences. We sought to create a gene-based teaching project to improve students' understanding of gene structure and to integrate this into a research project that can be implemented by instructors at the secondary level…

  16. Evolutionary signatures amongst disease genes permit novel methods for gene prioritization and construction of informative gene-based networks.

    Directory of Open Access Journals (Sweden)

    Nolan Priedigkeit

    2015-02-01

    Full Text Available Genes involved in the same function tend to have similar evolutionary histories, in that their rates of evolution covary over time. This coevolutionary signature, termed Evolutionary Rate Covariation (ERC, is calculated using only gene sequences from a set of closely related species and has demonstrated potential as a computational tool for inferring functional relationships between genes. To further define applications of ERC, we first established that roughly 55% of genetic diseases posses an ERC signature between their contributing genes. At a false discovery rate of 5% we report 40 such diseases including cancers, developmental disorders and mitochondrial diseases. Given these coevolutionary signatures between disease genes, we then assessed ERC's ability to prioritize known disease genes out of a list of unrelated candidates. We found that in the presence of an ERC signature, the true disease gene is effectively prioritized to the top 6% of candidates on average. We then apply this strategy to a melanoma-associated region on chromosome 1 and identify MCL1 as a potential causative gene. Furthermore, to gain global insight into disease mechanisms, we used ERC to predict molecular connections between 310 nominally distinct diseases. The resulting "disease map" network associates several diseases with related pathogenic mechanisms and unveils many novel relationships between clinically distinct diseases, such as between Hirschsprung's disease and melanoma. Taken together, these results demonstrate the utility of molecular evolution as a gene discovery platform and show that evolutionary signatures can be used to build informative gene-based networks.

  17. MAGIC Database and Interfaces: An Integrated Package for Gene Discovery and Expression

    Directory of Open Access Journals (Sweden)

    Lee H. Pratt

    2006-03-01

    Full Text Available The rapidly increasing rate at which biological data is being produced requires a corresponding growth in relational databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe here a Modular Approach to a Genomic, Integrated and Comprehensive (MAGIC Database. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via production and analysis of expressed sequence tags (ESTs, and subsequently on gene expression as assessed by both EST clustering and microarrays. The MAGIC Gene Discovery portion of the database focuses on information derived from DNA sequences and on its biological relevance. In addition to MAGIC SEQ-LIMS, which is designed to support activities in the laboratory, it contains several additional subschemas. The latter include MAGIC Admin for database administration, MAGIC Sequence for sequence processing as well as sequence and clone attributes, MAGIC Cluster for the results of EST clustering, MAGIC Polymorphism in support of microsatellite and single-nucleotide-polymorphism discovery, and MAGIC Annotation for electronic annotation by BLAST and BLAT. The MAGIC Microarray portion is a MIAME-compliant database with two components at present. These are MAGIC Array-LIMS, which makes possible remote entry of all information into the database, and MAGIC Array Analysis, which provides data mining and visualization. Because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only for individual research laboratories but also for core facilities that serve clients at any distance.

  18. A comparative review of estimates of the proportion unchanged genes and the false discovery rate

    Directory of Open Access Journals (Sweden)

    Broberg Per

    2005-08-01

    Full Text Available Abstract Background In the analysis of microarray data one generally produces a vector of p-values that for each gene give the likelihood of obtaining equally strong evidence of change by pure chance. The distribution of these p-values is a mixture of two components corresponding to the changed genes and the unchanged ones. The focus of this article is how to estimate the proportion unchanged and the false discovery rate (FDR and how to make inferences based on these concepts. Six published methods for estimating the proportion unchanged genes are reviewed, two alternatives are presented, and all are tested on both simulated and real data. All estimates but one make do without any parametric assumptions concerning the distributions of the p-values. Furthermore, the estimation and use of the FDR and the closely related q-value is illustrated with examples. Five published estimates of the FDR and one new are presented and tested. Implementations in R code are available. Results A simulation model based on the distribution of real microarray data plus two real data sets were used to assess the methods. The proposed alternative methods for estimating the proportion unchanged fared very well, and gave evidence of low bias and very low variance. Different methods perform well depending upon whether there are few or many regulated genes. Furthermore, the methods for estimating FDR showed a varying performance, and were sometimes misleading. The new method had a very low error. Conclusion The concept of the q-value or false discovery rate is useful in practical research, despite some theoretical and practical shortcomings. However, it seems possible to challenge the performance of the published methods, and there is likely scope for further developing the estimates of the FDR. The new methods provide the scientist with more options to choose a suitable method for any particular experiment. The article advocates the use of the conjoint information

  19. Resource Discovery in Activity-Based Sensor Networks

    DEFF Research Database (Denmark)

    Bucur, Doina; Bardram, Jakob

    This paper proposes a service discovery protocol for sensor networks that is specifically tailored for use in humancentered pervasive environments. It uses the high-level concept of computational activities (as logical bundles of data and resources) to give sensors in Activity-Based Sensor Networks...... (ABSNs) knowledge about their usage even at the network layer. ABSN redesigns classical network-level service discovery protocols to include and use this logical structuring of the network for a more practically applicable service discovery scheme. Noting that in practical settings activity-based sensor...

  20. RNA-Seq analysis and gene discovery of Andrias davidianus using Illumina short read sequencing.

    Directory of Open Access Journals (Sweden)

    Fenggang Li

    Full Text Available The Chinese giant salamander, Andrias davidianus, is an important species in the course of evolution; however, there is insufficient genomic data in public databases for understanding its immunologic mechanisms. High-throughput transcriptome sequencing is necessary to generate an enormous number of transcript sequences from A. davidianus for gene discovery. In this study, we generated more than 40 million reads from samples of spleen and skin tissue using the Illumina paired-end sequencing technology. De novo assembly yielded 87,297 transcripts with a mean length of 734 base pairs (bp. Based on the sequence similarities, searching with known proteins, 38,916 genes were identified. Gene enrichment analysis determined that 981 transcripts were assigned to the immune system. Tissue-specific expression analysis indicated that 443 of transcripts were specifically expressed in the spleen and skin. Among these transcripts, 147 transcripts were found to be involved in immune responses and inflammatory reactions, such as fucolectin, β-defensins and lymphotoxin beta. Eight tissue-specific genes were selected for validation using real time reverse transcription quantitative PCR (qRT-PCR. The results showed that these genes were significantly more expressed in spleen and skin than in other tissues, suggesting that these genes have vital roles in the immune response. This work provides a comprehensive genomic sequence resource for A. davidianus and lays the foundation for future research on the immunologic and disease resistance mechanisms of A. davidianus and other amphibians.

  1. Link-based quantitative methods to identify differentially coexpressed genes and gene Pairs

    Directory of Open Access Journals (Sweden)

    Ye Zhi-Qiang

    2011-08-01

    Full Text Available Abstract Background Differential coexpression analysis (DCEA is increasingly used for investigating the global transcriptional mechanisms underlying phenotypic changes. Current DCEA methods mostly adopt a gene connectivity-based strategy to estimate differential coexpression, which is characterized by comparing the numbers of gene neighbors in different coexpression networks. Although it simplifies the calculation, this strategy mixes up the identities of different coexpression neighbors of a gene, and fails to differentiate significant differential coexpression changes from those trivial ones. Especially, the correlation-reversal is easily missed although it probably indicates remarkable biological significance. Results We developed two link-based quantitative methods, DCp and DCe, to identify differentially coexpressed genes and gene pairs (links. Bearing the uniqueness of exploiting the quantitative coexpression change of each gene pair in the coexpression networks, both methods proved to be superior to currently popular methods in simulation studies. Re-mining of a publicly available type 2 diabetes (T2D expression dataset from the perspective of differential coexpression analysis led to additional discoveries than those from differential expression analysis. Conclusions This work pointed out the critical weakness of current popular DCEA methods, and proposed two link-based DCEA algorithms that will make contribution to the development of DCEA and help extend it to a broader spectrum.

  2. Introduction to fragment-based drug discovery.

    Science.gov (United States)

    Erlanson, Daniel A

    2012-01-01

    Fragment-based drug discovery (FBDD) has emerged in the past decade as a powerful tool for discovering drug leads. The approach first identifies starting points: very small molecules (fragments) that are about half the size of typical drugs. These fragments are then expanded or linked together to generate drug leads. Although the origins of the technique date back some 30 years, it was only in the mid-1990s that experimental techniques became sufficiently sensitive and rapid for the concept to be become practical. Since that time, the field has exploded: FBDD has played a role in discovery of at least 18 drugs that have entered the clinic, and practitioners of FBDD can be found throughout the world in both academia and industry. Literally dozens of reviews have been published on various aspects of FBDD or on the field as a whole, as have three books (Jahnke and Erlanson, Fragment-based approaches in drug discovery, 2006; Zartler and Shapiro, Fragment-based drug discovery: a practical approach, 2008; Kuo, Fragment based drug design: tools, practical approaches, and examples, 2011). However, this chapter will assume that the reader is approaching the field with little prior knowledge. It will introduce some of the key concepts, set the stage for the chapters to follow, and demonstrate how X-ray crystallography plays a central role in fragment identification and advancement.

  3. Using concepts in literature-based discovery : Simulating Swanson's Raynaud-fish oil and migraine-magnesium discoveries

    NARCIS (Netherlands)

    Weeber, M; Klein, Henny; de Jong-van den Berg, LTW; Vos, R

    Literature-based discovery has resulted in new knowledge. In the biomedical context, Don R. Swanson has generated several literature-based hypotheses that have been corroborated experimentally and clinically. In this paper, we propose a two-step model of the discovery process in which hypotheses are

  4. Discovery of Putative Herbicide Resistance Genes and Its Regulatory Network in Chickpea Using Transcriptome Sequencing

    Directory of Open Access Journals (Sweden)

    Mir A. Iquebal

    2017-06-01

    Full Text Available Background: Chickpea (Cicer arietinum L. contributes 75% of total pulse production. Being cheaper than animal protein, makes it important in dietary requirement of developing countries. Weed not only competes with chickpea resulting into drastic yield reduction but also creates problem of harboring fungi, bacterial diseases and insect pests. Chemical approach having new herbicide discovery has constraint of limited lead molecule options, statutory regulations and environmental clearance. Through genetic approach, transgenic herbicide tolerant crop has given successful result but led to serious concern over ecological safety thus non-transgenic approach like marker assisted selection is desirable. Since large variability in tolerance limit of herbicide already exists in chickpea varieties, thus the genes offering herbicide tolerance can be introgressed in variety improvement programme. Transcriptome studies can discover such associated key genes with herbicide tolerance in chickpea.Results: This is first transcriptomic studies of chickpea or even any legume crop using two herbicide susceptible and tolerant genotypes exposed to imidazoline (Imazethapyr. Approximately 90 million paired-end reads generated from four samples were processed and assembled into 30,803 contigs using reference based assembly. We report 6,310 differentially expressed genes (DEGs, of which 3,037 were regulated by 980 miRNAs, 1,528 transcription factors associated with 897 DEGs, 47 Hub proteins, 3,540 putative Simple Sequence Repeat-Functional Domain Marker (SSR-FDM, 13,778 genic Single Nucleotide Polymorphism (SNP putative markers and 1,174 Indels. Randomly selected 20 DEGs were validated using qPCR. Pathway analysis suggested that xenobiotic degradation related gene, glutathione S-transferase (GST were only up-regulated in presence of herbicide. Down-regulation of DNA replication genes and up-regulation of abscisic acid pathway genes were observed. Study further reveals

  5. Technology development for gene discovery and full-length sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Marcelo Bento Soares

    2004-07-19

    In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.

  6. Discovery of time-delayed gene regulatory networks based on temporal gene expression profiling

    Directory of Open Access Journals (Sweden)

    Guo Zheng

    2006-01-01

    Full Text Available Abstract Background It is one of the ultimate goals for modern biological research to fully elucidate the intricate interplays and the regulations of the molecular determinants that propel and characterize the progression of versatile life phenomena, to name a few, cell cycling, developmental biology, aging, and the progressive and recurrent pathogenesis of complex diseases. The vast amount of large-scale and genome-wide time-resolved data is becoming increasing available, which provides the golden opportunity to unravel the challenging reverse-engineering problem of time-delayed gene regulatory networks. Results In particular, this methodological paper aims to reconstruct regulatory networks from temporal gene expression data by using delayed correlations between genes, i.e., pairwise overlaps of expression levels shifted in time relative each other. We have thus developed a novel model-free computational toolbox termed TdGRN (Time-delayed Gene Regulatory Network to address the underlying regulations of genes that can span any unit(s of time intervals. This bioinformatics toolbox has provided a unified approach to uncovering time trends of gene regulations through decision analysis of the newly designed time-delayed gene expression matrix. We have applied the proposed method to yeast cell cycling and human HeLa cell cycling and have discovered most of the underlying time-delayed regulations that are supported by multiple lines of experimental evidence and that are remarkably consistent with the current knowledge on phase characteristics for the cell cyclings. Conclusion We established a usable and powerful model-free approach to dissecting high-order dynamic trends of gene-gene interactions. We have carefully validated the proposed algorithm by applying it to two publicly available cell cycling datasets. In addition to uncovering the time trends of gene regulations for cell cycling, this unified approach can also be used to study the complex

  7. Comprehensive Clinical Phenotyping & Genetic Mapping for the Discovery of Autism Susceptibility Genes

    Science.gov (United States)

    2012-12-05

    teaching students with autism spectrum disorders 4.52 Learn strategies for incorporating IEP goals and district standard into daily teaching...W403 Columbus, OH 43205 Final Report Comprehensive Clinical Phenotyping & Genetic Mapping for the Discovery of Autism Susceptibility Genes...QFOXGHDUHDFRGH 1.0 Summary In 2006, the Central Ohio Registry for Autism (CORA) was initiated as a collaboration between Wright-Patterson Air

  8. Gene Discovery in the Apicomplexa as Revealed by EST Sequencing and Assembly of a Comparative Gene Database

    Science.gov (United States)

    Li, Li; Brunk, Brian P.; Kissinger, Jessica C.; Pape, Deana; Tang, Keliang; Cole, Robert H.; Martin, John; Wylie, Todd; Dante, Mike; Fogarty, Steven J.; Howe, Daniel K.; Liberator, Paul; Diaz, Carmen; Anderson, Jennifer; White, Michael; Jerome, Maria E.; Johnson, Emily A.; Radke, Jay A.; Stoeckert, Christian J.; Waterston, Robert H.; Clifton, Sandra W.; Roos, David S.; Sibley, L. David

    2003-01-01

    Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%–20% represent putative homologs with a conservative cutoff of p neurona: , , , , , , , , , , , , , –, –, –, –, –. Eimeria tenella: –, –, –, –, –, –, –, –, – , –, –, –, –, –, –, –, –, –, –, –. Neospora caninum: –, –, , – , –, –.] PMID:12618375

  9. Discovery of a Novel Inhibitor of the Hedgehog Signaling Pathway through Cell-based Compound Discovery and Target Prediction.

    Science.gov (United States)

    Kremer, Lea; Schultz-Fademrecht, Carsten; Baumann, Matthias; Habenberger, Peter; Choidas, Axel; Klebl, Bert; Kordes, Susanne; Schöler, Hans R; Sterneckert, Jared; Ziegler, Slava; Schneider, Gisbert; Waldmann, Herbert

    2017-10-09

    Cell-based assays enable monitoring of small-molecule bioactivity in a target-agnostic manner and help uncover new biological mechanisms. Subsequent identification and validation of the small-molecule targets, typically employing proteomics techniques, is very challenging and limited, in particular if the targets are membrane proteins. Herein, we demonstrate that the combination of cell-based bioactive-compound discovery with cheminformatic target prediction may provide an efficient approach to accelerate the process and render target identification and validation more efficient. Using a cell-based assay, we identified the pyrazolo-imidazole smoothib as a new inhibitor of hedgehog (Hh) signaling and an antagonist of the protein smoothened (SMO) with a novel chemotype. Smoothib targets the heptahelical bundle of SMO, prevents its ciliary localization, reduces the expression of Hh target genes, and suppresses the growth of Ptch +/- medulloblastoma cells. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Pine Gene Discovery Project - Final Report - 08/31/1997 - 02/28/2001; FINAL

    International Nuclear Information System (INIS)

    Whetten, R. W.; Sederoff, R. R.; Kinlaw, C.; Retzel, E.

    2001-01-01

    Integration of pines into the large scope of plant biology research depends on study of pines in parallel with study of annual plants, and on availability of research materials from pine to plant biologists interested in comparing pine with annual plant systems. The objectives of the Pine Gene Discovery Project were to obtain 10,000 partial DNA sequences of genes expressed in loblolly pine, to determine which of those pine genes were similar to known genes from other organisms, and to make the DNA sequences and isolated pine genes available to plant researchers to stimulate integration of pines into the wider scope of plant biology research. Those objectives have been completed, and the results are available to the public. Requests for pine genes have been received from a number of laboratories that would otherwise not have included pine in their research, indicating that progress is being made toward the goal of integrating pine research into the larger molecular biology research community

  11. Discovery of rare protein-coding genes in model methylotroph Methylobacterium extorquens AM1.

    Science.gov (United States)

    Kumar, Dhirendra; Mondal, Anupam Kumar; Yadav, Amit Kumar; Dash, Debasis

    2014-12-01

    Proteogenomics involves the use of MS to refine annotation of protein-coding genes and discover genes in a genome. We carried out comprehensive proteogenomic analysis of Methylobacterium extorquens AM1 (ME-AM1) from publicly available proteomics data with a motive to improve annotation for methylotrophs; organisms capable of surviving in reduced carbon compounds such as methanol. Besides identifying 2482(50%) proteins, 29 new genes were discovered and 66 annotated gene models were revised in ME-AM1 genome. One such novel gene is identified with 75 peptides, lacks homolog in other methylobacteria but has glycosyl transferase and lipopolysaccharide biosynthesis protein domains, indicating its potential role in outer membrane synthesis. Many novel genes are present only in ME-AM1 among methylobacteria. Distant homologs of these genes in unrelated taxonomic classes and low GC-content of few genes suggest lateral gene transfer as a potential mode of their origin. Annotations of methylotrophy related genes were also improved by the discovery of a short gene in methylotrophy gene island and redefining a gene important for pyrroquinoline quinone synthesis, essential for methylotrophy. The combined use of proteogenomics and rigorous bioinformatics analysis greatly enhanced the annotation of protein-coding genes in model methylotroph ME-AM1 genome. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Exome sequencing for gene discovery in lethal fetal disorders--harnessing the value of extreme phenotypes.

    Science.gov (United States)

    Filges, Isabel; Friedman, Jan M

    2015-10-01

    Massively parallel sequencing has revolutionized our understanding of Mendelian disorders, and many novel genes have been discovered to cause disease phenotypes when mutant. At the same time, next-generation sequencing approaches have enabled non-invasive prenatal testing of free fetal DNA in maternal blood. However, little attention has been paid to using whole exome and genome sequencing strategies for gene identification in fetal disorders that are lethal in utero, because they can appear to be sporadic and Mendelian inheritance may be missed. We present challenges and advantages of applying next-generation sequencing approaches to gene discovery in fetal malformation phenotypes and review recent successful discovery approaches. We discuss the implication and significance of recessive inheritance and cross-species phenotyping in fetal lethal conditions. Whole exome sequencing can be used in individual families with undiagnosed lethal congenital anomaly syndromes to discover causal mutations, provided that prior to data analysis, the fetal phenotype can be correlated to a particular developmental pathway in embryogenesis. Cross-species phenotyping allows providing further evidence for causality of discovered variants in genes involved in those extremely rare phenotypes and will increase our knowledge about normal and abnormal human developmental processes. Ultimately, families will benefit from the option of early prenatal diagnosis. © 2014 John Wiley & Sons, Ltd.

  13. Mass Spectrometry-Based Biomarker Discovery.

    Science.gov (United States)

    Zhou, Weidong; Petricoin, Emanuel F; Longo, Caterina

    2017-01-01

    The discovery of candidate biomarkers within the entire proteome is one of the most important and challenging goals in proteomic research. Mass spectrometry-based proteomics is a modern and promising technology for semiquantitative and qualitative assessment of proteins, enabling protein sequencing and identification with exquisite accuracy and sensitivity. For mass spectrometry analysis, protein extractions from tissues or body fluids and subsequent protein fractionation represent an important and unavoidable step in the workflow for biomarker discovery. Following extraction of proteins, the protein mixture must be digested, reduced, alkylated, and cleaned up prior to mass spectrometry. The aim of our chapter is to provide comprehensible and practical lab procedures for sample digestion, protein fractionation, and subsequent mass spectrometry analysis.

  14. A hybrid network-based method for the detection of disease-related genes

    Science.gov (United States)

    Cui, Ying; Cai, Meng; Dai, Yang; Stanley, H. Eugene

    2018-02-01

    Detecting disease-related genes is crucial in disease diagnosis and drug design. The accepted view is that neighbors of a disease-causing gene in a molecular network tend to cause the same or similar diseases, and network-based methods have been recently developed to identify novel hereditary disease-genes in available biomedical networks. Despite the steady increase in the discovery of disease-associated genes, there is still a large fraction of disease genes that remains under the tip of the iceberg. In this paper we exploit the topological properties of the protein-protein interaction (PPI) network to detect disease-related genes. We compute, analyze, and compare the topological properties of disease genes with non-disease genes in PPI networks. We also design an improved random forest classifier based on these network topological features, and a cross-validation test confirms that our method performs better than previous similar studies.

  15. Exploiting Pre-rRNA Processing in Diamond Blackfan Anemia Gene Discovery and Diagnosis

    Science.gov (United States)

    Farrar, Jason E.; Quarello, Paola; Fisher, Ross; O’Brien, Kelly A.; Aspesi, Anna; Parrella, Sara; Henson, Adrianna L.; Seidel, Nancy E.; Atsidaftos, Eva; Prakash, Supraja; Bari, Shahla; Garelli, Emanuela; Arceci, Robert J.; Dianzani, Irma; Ramenghi, Ugo; Vlachos, Adrianna; Lipton, Jeffrey M.; Bodine, David M.; Ellis, Steven R.

    2014-01-01

    Diamond Blackfan anemia (DBA), a syndrome primarily characterized by anemia and physical abnormalities, is one among a group of related inherited bone marrow failure syndromes (IBMFS) which share overlapping clinical features. Heterozygous mutations or single-copy deletions have been identified in 12 ribosomal protein genes in approximately 60% of DBA cases, with the genetic etiology unexplained in most remaining patients. Unlike many IBMFS, for which functional screening assays complement clinical and genetic findings, suspected DBA in the absence of typical alterations of the known genes must frequently be diagnosed after exclusion of other IBMFS. We report here a novel deletion in a child that presented such a diagnostic challenge and prompted development of a novel functional assay that can assist in the diagnosis of a significant fraction of patients with DBA. The ribosomal proteins affected in DBA are required for pre-rRNA processing, a process which can be interrogated to monitor steps in the maturation of 40S and 60S ribosomal subunits. In contrast to prior methods used to assess pre-rRNA processing, the assay reported here, based on capillary electrophoresis measurement of the maturation of rRNA in pre-60S ribosomal subunits, would be readily amenable to use in diagnostic laboratories. In addition to utility as a diagnostic tool, we applied this technique to gene discovery in DBA, resulting in the identification of RPL31 as a novel DBA gene. PMID:25042156

  16. Gene discovery by chemical mutagenesis and whole-genome sequencing in Dictyostelium.

    Science.gov (United States)

    Li, Cheng-Lin Frank; Santhanam, Balaji; Webb, Amanda Nicole; Zupan, Blaž; Shaulsky, Gad

    2016-09-01

    Whole-genome sequencing is a useful approach for identification of chemical-induced lesions, but previous applications involved tedious genetic mapping to pinpoint the causative mutations. We propose that saturation mutagenesis under low mutagenic loads, followed by whole-genome sequencing, should allow direct implication of genes by identifying multiple independent alleles of each relevant gene. We tested the hypothesis by performing three genetic screens with chemical mutagenesis in the social soil amoeba Dictyostelium discoideum Through genome sequencing, we successfully identified mutant genes with multiple alleles in near-saturation screens, including resistance to intense illumination and strong suppressors of defects in an allorecognition pathway. We tested the causality of the mutations by comparison to published data and by direct complementation tests, finding both dominant and recessive causative mutations. Therefore, our strategy provides a cost- and time-efficient approach to gene discovery by integrating chemical mutagenesis and whole-genome sequencing. The method should be applicable to many microbial systems, and it is expected to revolutionize the field of functional genomics in Dictyostelium by greatly expanding the mutation spectrum relative to other common mutagenesis methods. © 2016 Li et al.; Published by Cold Spring Harbor Laboratory Press.

  17. Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data.

    Science.gov (United States)

    Yip, Shun H; Sham, Pak Chung; Wang, Junwen

    2018-02-21

    Traditional RNA sequencing (RNA-seq) allows the detection of gene expression variations between two or more cell populations through differentially expressed gene (DEG) analysis. However, genes that contribute to cell-to-cell differences are not discoverable with RNA-seq because RNA-seq samples are obtained from a mixture of cells. Single-cell RNA-seq (scRNA-seq) allows the detection of gene expression in each cell. With scRNA-seq, highly variable gene (HVG) discovery allows the detection of genes that contribute strongly to cell-to-cell variation within a homogeneous cell population, such as a population of embryonic stem cells. This analysis is implemented in many software packages. In this study, we compare seven HVG methods from six software packages, including BASiCS, Brennecke, scLVM, scran, scVEGs and Seurat. Our results demonstrate that reproducibility in HVG analysis requires a larger sample size than DEG analysis. Discrepancies between methods and potential issues in these tools are discussed and recommendations are made.

  18. Biomarker discovery in mass spectrometry-based urinary proteomics.

    Science.gov (United States)

    Thomas, Samuel; Hao, Ling; Ricke, William A; Li, Lingjun

    2016-04-01

    Urinary proteomics has become one of the most attractive topics in disease biomarker discovery. MS-based proteomic analysis has advanced continuously and emerged as a prominent tool in the field of clinical bioanalysis. However, only few protein biomarkers have made their way to validation and clinical practice. Biomarker discovery is challenged by many clinical and analytical factors including, but not limited to, the complexity of urine and the wide dynamic range of endogenous proteins in the sample. This article highlights promising technologies and strategies in the MS-based biomarker discovery process, including study design, sample preparation, protein quantification, instrumental platforms, and bioinformatics. Different proteomics approaches are discussed, and progresses in maximizing urinary proteome coverage and standardization are emphasized in this review. MS-based urinary proteomics has great potential in the development of noninvasive diagnostic assays in the future, which will require collaborative efforts between analytical scientists, systems biologists, and clinicians. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Maximizing biomarker discovery by minimizing gene signatures

    Directory of Open Access Journals (Sweden)

    Chang Chang

    2011-12-01

    Full Text Available Abstract Background The use of gene signatures can potentially be of considerable value in the field of clinical diagnosis. However, gene signatures defined with different methods can be quite various even when applied the same disease and the same endpoint. Previous studies have shown that the correct selection of subsets of genes from microarray data is key for the accurate classification of disease phenotypes, and a number of methods have been proposed for the purpose. However, these methods refine the subsets by only considering each single feature, and they do not confirm the association between the genes identified in each gene signature and the phenotype of the disease. We proposed an innovative new method termed Minimize Feature's Size (MFS based on multiple level similarity analyses and association between the genes and disease for breast cancer endpoints by comparing classifier models generated from the second phase of MicroArray Quality Control (MAQC-II, trying to develop effective meta-analysis strategies to transform the MAQC-II signatures into a robust and reliable set of biomarker for clinical applications. Results We analyzed the similarity of the multiple gene signatures in an endpoint and between the two endpoints of breast cancer at probe and gene levels, the results indicate that disease-related genes can be preferably selected as the components of gene signature, and that the gene signatures for the two endpoints could be interchangeable. The minimized signatures were built at probe level by using MFS for each endpoint. By applying the approach, we generated a much smaller set of gene signature with the similar predictive power compared with those gene signatures from MAQC-II. Conclusions Our results indicate that gene signatures of both large and small sizes could perform equally well in clinical applications. Besides, consistency and biological significances can be detected among different gene signatures, reflecting the

  20. Peroxidase gene discovery from the horseradish transcriptome.

    Science.gov (United States)

    Näätsaari, Laura; Krainer, Florian W; Schubert, Michael; Glieder, Anton; Thallinger, Gerhard G

    2014-03-24

    Horseradish peroxidases (HRPs) from Armoracia rusticana have long been utilized as reporters in various diagnostic assays and histochemical stainings. Regardless of their increasing importance in the field of life sciences and suggested uses in medical applications, chemical synthesis and other industrial applications, the HRP isoenzymes, their substrate specificities and enzymatic properties are poorly characterized. Due to lacking sequence information of natural isoenzymes and the low levels of HRP expression in heterologous hosts, commercially available HRP is still extracted as a mixture of isoenzymes from the roots of A. rusticana. In this study, a normalized, size-selected A. rusticana transcriptome library was sequenced using 454 Titanium technology. The resulting reads were assembled into 14871 isotigs with an average length of 1133 bp. Sequence databases, ORF finding and ORF characterization were utilized to identify peroxidase genes from the 14871 isotigs generated by de novo assembly. The sequences were manually reviewed and verified with Sanger sequencing of PCR amplified genomic fragments, resulting in the discovery of 28 secretory peroxidases, 23 of them previously unknown. A total of 22 isoenzymes including allelic variants were successfully expressed in Pichia pastoris and showed peroxidase activity with at least one of the substrates tested, thus enabling their development into commercial pure isoenzymes. This study demonstrates that transcriptome sequencing combined with sequence motif search is a powerful concept for the discovery and quick supply of new enzymes and isoenzymes from any plant or other eukaryotic organisms. Identification and manual verification of the sequences of 28 HRP isoenzymes do not only contribute a set of peroxidases for industrial, biological and biomedical applications, but also provide valuable information on the reliability of the approach in identifying and characterizing a large group of isoenzymes.

  1. Gene discovery and molecular marker development, based on high-throughput transcript sequencing of Paspalum dilatatum Poir.

    Directory of Open Access Journals (Sweden)

    Andrea Giordano

    Full Text Available BACKGROUND: Paspalum dilatatum Poir. (common name dallisgrass is a native grass species of South America, with special relevance to dairy and red meat production. P. dilatatum exhibits higher forage quality than other C4 forage grasses and is tolerant to frost and water stress. This species is predominantly cultivated in an apomictic monoculture, with an inherent high risk that biotic and abiotic stresses could potentially devastate productivity. Therefore, advanced breeding strategies that characterise and use available genetic diversity, or assess germplasm collections effectively are required to deliver advanced cultivars for production systems. However, there are limited genomic resources available for this forage grass species. RESULTS: Transcriptome sequencing using second-generation sequencing platforms has been employed using pooled RNA from different tissues (stems, roots, leaves and inflorescences at the final reproductive stage of P. dilatatum cultivar Primo. A total of 324,695 sequence reads were obtained, corresponding to c. 102 Mbp. The sequences were assembled, generating 20,169 contigs of a combined length of 9,336,138 nucleotides. The contigs were BLAST analysed against the fully sequenced grass species of Oryza sativa subsp. japonica, Brachypodium distachyon, the closely related Sorghum bicolor and foxtail millet (Setaria italica genomes as well as against the UniRef 90 protein database allowing a comprehensive gene ontology analysis to be performed. The contigs generated from the transcript sequencing were also analysed for the presence of simple sequence repeats (SSRs. A total of 2,339 SSR motifs were identified within 1,989 contigs and corresponding primer pairs were designed. Empirical validation of a cohort of 96 SSRs was performed, with 34% being polymorphic between sexual and apomictic biotypes. CONCLUSIONS: The development of genetic and genomic resources for P. dilatatum will contribute to gene discovery and expression

  2. Fragment approaches in structure-based drug discovery

    International Nuclear Information System (INIS)

    Hubbard, Roderick E.

    2008-01-01

    Fragment-based methods are successfully generating novel and selective drug-like inhibitors of protein targets, with a number of groups reporting compounds entering clinical trials. This paper summarizes the key features of the approach as one of the tools in structure-guided drug discovery. There has been considerable interest recently in what is known as 'fragment-based lead discovery'. The novel feature of the approach is to begin with small low-affinity compounds. The main advantage is that a larger potential chemical diversity can be sampled with fewer compounds, which is particularly important for new target classes. The approach relies on careful design of the fragment library, a method that can detect binding of the fragment to the protein target, determination of the structure of the fragment bound to the target, and the conventional use of structural information to guide compound optimization. In this article the methods are reviewed, and experiences in fragment-based discovery of lead series of compounds against kinases such as PDK1 and ATPases such as Hsp90 are discussed. The examples illustrate some of the key benefits and issues of the approach and also provide anecdotal examples of the patterns seen in selectivity and the binding mode of fragments across different protein targets

  3. Development of Scientific Approach Based on Discovery Learning Module

    Science.gov (United States)

    Ellizar, E.; Hardeli, H.; Beltris, S.; Suharni, R.

    2018-04-01

    Scientific Approach is a learning process, designed to make the students actively construct their own knowledge through stages of scientific method. The scientific approach in learning process can be done by using learning modules. One of the learning model is discovery based learning. Discovery learning is a learning model for the valuable things in learning through various activities, such as observation, experience, and reasoning. In fact, the students’ activity to construct their own knowledge were not optimal. It’s because the available learning modules were not in line with the scientific approach. The purpose of this study was to develop a scientific approach discovery based learning module on Acid Based, also on electrolyte and non-electrolyte solution. The developing process of this chemistry modules use the Plomp Model with three main stages. The stages are preliminary research, prototyping stage, and the assessment stage. The subject of this research was the 10th and 11th Grade of Senior High School students (SMAN 2 Padang). Validation were tested by the experts of Chemistry lecturers and teachers. Practicality of these modules had been tested through questionnaire. The effectiveness had been tested through experimental procedure by comparing student achievement between experiment and control groups. Based on the findings, it can be concluded that the developed scientific approach discovery based learning module significantly improve the students’ learning in Acid-based and Electrolyte solution. The result of the data analysis indicated that the chemistry module was valid in content, construct, and presentation. Chemistry module also has a good practicality level and also accordance with the available time. This chemistry module was also effective, because it can help the students to understand the content of the learning material. That’s proved by the result of learning student. Based on the result can conclude that chemistry module based on

  4. Genome Enabled Discovery of Carbon Sequestration Genes in Poplar

    Energy Technology Data Exchange (ETDEWEB)

    Filichkin, Sergei; Etherington, Elizabeth; Ma, Caiping; Strauss, Steve

    2007-02-22

    The goals of the S.H. Strauss laboratory portion of 'Genome-enabled discovery of carbon sequestration genes in poplar' are (1) to explore the functions of candidate genes using Populus transformation by inserting genes provided by Oakridge National Laboratory (ORNL) and the University of Florida (UF) into poplar; (2) to expand the poplar transformation toolkit by developing transformation methods for important genotypes; and (3) to allow induced expression, and efficient gene suppression, in roots and other tissues. As part of the transformation improvement effort, OSU developed transformation protocols for Populus trichocarpa 'Nisqually-1' clone and an early flowering P. alba clone, 6K10. Complete descriptions of the transformation systems were published (Ma et. al. 2004, Meilan et. al 2004). Twenty-one 'Nisqually-1' and 622 6K10 transgenic plants were generated. To identify root predominant promoters, a set of three promoters were tested for their tissue-specific expression patterns in poplar and in Arabidopsis as a model system. A novel gene, ET304, was identified by analyzing a collection of poplar enhancer trap lines generated at OSU (Filichkin et. al 2006a, 2006b). Other promoters include the pGgMT1 root-predominant promoter from Casuarina glauca and the pAtPIN2 promoter from Arabidopsis root specific PIN2 gene. OSU tested two induction systems, alcohol- and estrogen-inducible, in multiple poplar transgenics. Ethanol proved to be the more efficient when tested in tissue culture and greenhouse conditions. Two estrogen-inducible systems were evaluated in transgenic Populus, neither of which functioned reliably in tissue culture conditions. GATEWAY-compatible plant binary vectors were designed to compare the silencing efficiency of homologous (direct) RNAi vs. heterologous (transitive) RNAi inverted repeats. A set of genes was targeted for post transcriptional silencing in the model Arabidopsis system; these include the floral

  5. The Matchmaker Exchange: a platform for rare disease gene discovery.

    Science.gov (United States)

    Philippakis, Anthony A; Azzariti, Danielle R; Beltran, Sergi; Brookes, Anthony J; Brownstein, Catherine A; Brudno, Michael; Brunner, Han G; Buske, Orion J; Carey, Knox; Doll, Cassie; Dumitriu, Sergiu; Dyke, Stephanie O M; den Dunnen, Johan T; Firth, Helen V; Gibbs, Richard A; Girdea, Marta; Gonzalez, Michael; Haendel, Melissa A; Hamosh, Ada; Holm, Ingrid A; Huang, Lijia; Hurles, Matthew E; Hutton, Ben; Krier, Joel B; Misyura, Andriy; Mungall, Christopher J; Paschall, Justin; Paten, Benedict; Robinson, Peter N; Schiettecatte, François; Sobreira, Nara L; Swaminathan, Ganesh J; Taschner, Peter E; Terry, Sharon F; Washington, Nicole L; Züchner, Stephan; Boycott, Kym M; Rehm, Heidi L

    2015-10-01

    There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can "match" these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow. © 2015 WILEY PERIODICALS, INC.

  6. Ranking candidate disease genes from gene expression and protein interaction: a Katz-centrality based approach.

    Directory of Open Access Journals (Sweden)

    Jing Zhao

    Full Text Available Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions--that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.

  7. Bioinformatics for discovery of microbiome variation

    DEFF Research Database (Denmark)

    Brejnrod, Asker Daniel

    of various molecular methods to build hypotheses about the impact of a copper contaminated soil. The introduction is a broad introduction to the field of microbiome research with a focus on the technologies that enable these discoveries and how some of the broader issues have related to this thesis......Sequencing based tools have revolutionized microbiology in recent years. Highthroughput DNA sequencing have allowed high-resolution studies on microbial life in many different environments and at unprecedented low cost. These culture-independent methods have helped discovery of novel bacteria...... 1 ,“Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies”, benchmarked the performance of a variety of popular statistical methods for discovering differentially abundant bacteria . between...

  8. SNP discovery in candidate adaptive genes using exon capture in a free-ranging alpine ungulate

    Science.gov (United States)

    Roffler, Gretchen H.; Amish, Stephen J.; Smith, Seth; Cosart, Ted F.; Kardos, Marty; Schwartz, Michael K.; Luikart, Gordon

    2016-01-01

    Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding and nearby 5′ and 3′ untranslated regions of chosen candidate genes. Targeted sequences were taken from bighorn sheep (Ovis canadensis) exon capture data and directly from the domestic sheep genome (Ovis aries v. 3; oviAri3). The bighorn sheep sequences used in the Dall's sheep (Ovis dalli dalli) exon capture aligned to 2350 genes on the oviAri3 genome with an average of 2 exons each. We developed a microfluidic qPCR-based SNP chip to genotype 476 Dall's sheep from locations across their range and test for patterns of selection. Using multiple corroborating approaches (lositan and bayescan), we detected 28 SNP loci potentially under selection. We additionally identified candidate loci significantly associated with latitude, longitude, precipitation and temperature, suggesting local environmental adaptation. The three methods demonstrated consistent support for natural selection on nine genes with immune and disease-regulating functions (e.g. Ovar-DRA, APC, BATF2, MAGEB18), cell regulation signalling pathways (e.g. KRIT1, PI3K, ORRC3), and respiratory health (CYSLTR1). Characterizing adaptive allele distributions from novel genetic techniques will facilitate investigation of the influence of environmental variation on local adaptation of a northern alpine ungulate throughout its range. This research demonstrated the utility of exon capture for gene-targeted SNP discovery and subsequent SNP chip genotyping using low-quality samples in a nonmodel species.

  9. InFusion: Advancing Discovery of Fusion Genes and Chimeric Transcripts from Deep RNA-Sequencing Data.

    Directory of Open Access Journals (Sweden)

    Konstantin Okonechnikov

    Full Text Available Analysis of fusion transcripts has become increasingly important due to their link with cancer development. Since high-throughput sequencing approaches survey fusion events exhaustively, several computational methods for the detection of gene fusions from RNA-seq data have been developed. This kind of analysis, however, is complicated by native trans-splicing events, the splicing-induced complexity of the transcriptome and biases and artefacts introduced in experiments and data analysis. There are a number of tools available for the detection of fusions from RNA-seq data; however, certain differences in specificity and sensitivity between commonly used approaches have been found. The ability to detect gene fusions of different types, including isoform fusions and fusions involving non-coding regions, has not been thoroughly studied yet. Here, we propose a novel computational toolkit called InFusion for fusion gene detection from RNA-seq data. InFusion introduces several unique features, such as discovery of fusions involving intergenic regions, and detection of anti-sense transcription in chimeric RNAs based on strand-specificity. Our approach demonstrates superior detection accuracy on simulated data and several public RNA-seq datasets. This improved performance was also evident when evaluating data from RNA deep-sequencing of two well-established prostate cancer cell lines. InFusion identified 26 novel fusion events that were validated in vitro, including alternatively spliced gene fusion isoforms and chimeric transcripts that include intergenic regions. The toolkit is freely available to download from http:/bitbucket.org/kokonech/infusion.

  10. SECURE SERVICE DISCOVERY BASED ON PROBE PACKET MECHANISM FOR MANETS

    Directory of Open Access Journals (Sweden)

    S. Pariselvam

    2015-03-01

    Full Text Available In MANETs, Service discovery process is always considered to be crucial since they do not possess a centralized infrastructure for communication. Moreover, different services available through the network necessitate varying categories. Hence, a need arises for devising a secure probe based service discovery mechanism to reduce the complexity in providing the services to the network users. In this paper, we propose a Secure Service Discovery Based on Probe Packet Mechanism (SSDPPM for identifying the DoS attack in MANETs, which depicts a new approach for estimating the level of trust present in each and every routing path of a mobile ad hoc network by using probe packets. Probing based service discovery mechanisms mainly identifies a mobile node’s genuineness using a test packet called probe that travels the entire network for the sake of computing the degree of trust maintained between the mobile nodes and it’s attributed impact towards the network performance. The performance of SSDPPM is investigated through a wide range of network related parameters like packet delivery, throughput, Control overhead and total overhead using the version ns-2.26 network simulator. This mechanism SSDPPM, improves the performance of the network in an average by 23% and 19% in terms of packet delivery ratio and throughput than the existing service discovery mechanisms available in the literature.

  11. Crowdsourcing the nodulation gene network discovery environment.

    Science.gov (United States)

    Li, Yupeng; Jackson, Scott A

    2016-05-26

    The Legumes (Fabaceae) are an economically and ecologically important group of plant species with the conspicuous capacity for symbiotic nitrogen fixation in root nodules, specialized plant organs containing symbiotic microbes. With the aim of understanding the underlying molecular mechanisms leading to nodulation, many efforts are underway to identify nodulation-related genes and determine how these genes interact with each other. In order to accurately and efficiently reconstruct nodulation gene network, a crowdsourcing platform, CrowdNodNet, was created. The platform implements the jQuery and vis.js JavaScript libraries, so that users are able to interactively visualize and edit the gene network, and easily access the information about the network, e.g. gene lists, gene interactions and gene functional annotations. In addition, all the gene information is written on MediaWiki pages, enabling users to edit and contribute to the network curation. Utilizing the continuously updated, collaboratively written, and community-reviewed Wikipedia model, the platform could, in a short time, become a comprehensive knowledge base of nodulation-related pathways. The platform could also be used for other biological processes, and thus has great potential for integrating and advancing our understanding of the functional genomics and systems biology of any process for any species. The platform is available at http://crowd.bioops.info/ , and the source code can be openly accessed at https://github.com/bioops/crowdnodnet under MIT License.

  12. Culture-independent discovery of natural products from soil metagenomes.

    Science.gov (United States)

    Katz, Micah; Hover, Bradley M; Brady, Sean F

    2016-03-01

    Bacterial natural products have proven to be invaluable starting points in the development of many currently used therapeutic agents. Unfortunately, traditional culture-based methods for natural product discovery have been deemphasized by pharmaceutical companies due in large part to high rediscovery rates. Culture-independent, or "metagenomic," methods, which rely on the heterologous expression of DNA extracted directly from environmental samples (eDNA), have the potential to provide access to metabolites encoded by a large fraction of the earth's microbial biosynthetic diversity. As soil is both ubiquitous and rich in bacterial diversity, it is an appealing starting point for culture-independent natural product discovery efforts. This review provides an overview of the history of soil metagenome-driven natural product discovery studies and elaborates on the recent development of new tools for sequence-based, high-throughput profiling of environmental samples used in discovering novel natural product biosynthetic gene clusters. We conclude with several examples of these new tools being employed to facilitate the recovery of novel secondary metabolite encoding gene clusters from soil metagenomes and the subsequent heterologous expression of these clusters to produce bioactive small molecules.

  13. Exploring relation types for literature-based discovery.

    Science.gov (United States)

    Preiss, Judita; Stevenson, Mark; Gaizauskas, Robert

    2015-09-01

    Literature-based discovery (LBD) aims to identify "hidden knowledge" in the medical literature by: (1) analyzing documents to identify pairs of explicitly related concepts (terms), then (2) hypothesizing novel relations between pairs of unrelated concepts that are implicitly related via a shared concept to which both are explicitly related. Many LBD approaches use simple techniques to identify semantically weak relations between concepts, for example, document co-occurrence. These generate huge numbers of hypotheses, difficult for humans to assess. More complex techniques rely on linguistic analysis, for example, shallow parsing, to identify semantically stronger relations. Such approaches generate fewer hypotheses, but may miss hidden knowledge. The authors investigate this trade-off in detail, comparing techniques for identifying related concepts to discover which are most suitable for LBD. A generic LBD system that can utilize a range of relation types was developed. Experiments were carried out comparing a number of techniques for identifying relations. Two approaches were used for evaluation: replication of existing discoveries and the "time slicing" approach.(1) RESULTS: Previous LBD discoveries could be replicated using relations based either on document co-occurrence or linguistic analysis. Using relations based on linguistic analysis generated many fewer hypotheses, but a significantly greater proportion of them were candidates for hidden knowledge. The use of linguistic analysis-based relations improves accuracy of LBD without overly damaging coverage. LBD systems often generate huge numbers of hypotheses, which are infeasible to manually review. Improving their accuracy has the potential to make these systems significantly more usable. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  14. IMG-ABC: An Atlas of Biosynthetic Gene Clusters to Fuel the Discovery of Novel Secondary Metabolites

    Energy Technology Data Exchange (ETDEWEB)

    Chen, I-Min; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Huang, Jinghua; Reddy, T. B.K.; Cimermancic, Peter; Fischbach, Michael; Ivanova, Natalia; Markowitz, Victor; Kyrpides, Nikos; Pati, Amrita

    2014-10-28

    In the discovery of secondary metabolites (SMs), large-scale analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of relevant computational resources. We present IMG-ABC (https://img.jgi.doe.gov/abc/) -- An Atlas of Biosynthetic gene Clusters within the Integrated Microbial Genomes (IMG) system1. IMG-ABC is a rich repository of both validated and predicted biosynthetic clusters (BCs) in cultured isolates, single-cells and metagenomes linked with the SM chemicals they produce and enhanced with focused analysis tools within IMG. The underlying scalable framework enables traversal of phylogenetic dark matter and chemical structure space -- serving as a doorway to a new era in the discovery of novel molecules.

  15. From crystal to compound: structure-based antimalarial drug discovery.

    Science.gov (United States)

    Drinkwater, Nyssa; McGowan, Sheena

    2014-08-01

    Despite a century of control and eradication campaigns, malaria remains one of the world's most devastating diseases. Our once-powerful therapeutic weapons are losing the war against the Plasmodium parasite, whose ability to rapidly develop and spread drug resistance hamper past and present malaria-control efforts. Finding new and effective treatments for malaria is now a top global health priority, fuelling an increase in funding and promoting open-source collaborations between researchers and pharmaceutical consortia around the world. The result of this is rapid advances in drug discovery approaches and technologies, with three major methods for antimalarial drug development emerging: (i) chemistry-based, (ii) target-based, and (iii) cell-based. Common to all three of these approaches is the unique ability of structural biology to inform and accelerate drug development. Where possible, SBDD (structure-based drug discovery) is a foundation for antimalarial drug development programmes, and has been invaluable to the development of a number of current pre-clinical and clinical candidates. However, as we expand our understanding of the malarial life cycle and mechanisms of resistance development, SBDD as a field must continue to evolve in order to develop compounds that adhere to the ideal characteristics for novel antimalarial therapeutics and to avoid high attrition rates pre- and post-clinic. In the present review, we aim to examine the contribution that SBDD has made to current antimalarial drug development efforts, covering hit discovery to lead optimization and prevention of parasite resistance. Finally, the potential for structural biology, particularly high-throughput structural genomics programmes, to identify future targets for drug discovery are discussed.

  16. An Evaluation of Active Learning Causal Discovery Methods for Reverse-Engineering Local Causal Pathways of Gene Regulation

    Science.gov (United States)

    Ma, Sisi; Kemmeren, Patrick; Aliferis, Constantin F.; Statnikov, Alexander

    2016-01-01

    Reverse-engineering of causal pathways that implicate diseases and vital cellular functions is a fundamental problem in biomedicine. Discovery of the local causal pathway of a target variable (that consists of its direct causes and direct effects) is essential for effective intervention and can facilitate accurate diagnosis and prognosis. Recent research has provided several active learning methods that can leverage passively observed high-throughput data to draft causal pathways and then refine the inferred relations with a limited number of experiments. The current study provides a comprehensive evaluation of the performance of active learning methods for local causal pathway discovery in real biological data. Specifically, 54 active learning methods/variants from 3 families of algorithms were applied for local causal pathways reconstruction of gene regulation for 5 transcription factors in S. cerevisiae. Four aspects of the methods’ performance were assessed, including adjacency discovery quality, edge orientation accuracy, complete pathway discovery quality, and experimental cost. The results of this study show that some methods provide significant performance benefits over others and therefore should be routinely used for local causal pathway discovery tasks. This study also demonstrates the feasibility of local causal pathway reconstruction in real biological systems with significant quality and low experimental cost. PMID:26939894

  17. Current perspectives in fragment-based lead discovery (FBLD)

    Science.gov (United States)

    Lamoree, Bas; Hubbard, Roderick E.

    2017-01-01

    It is over 20 years since the first fragment-based discovery projects were disclosed. The methods are now mature for most ‘conventional’ targets in drug discovery such as enzymes (kinases and proteases) but there has also been growing success on more challenging targets, such as disruption of protein–protein interactions. The main application is to identify tractable chemical startpoints that non-covalently modulate the activity of a biological molecule. In this essay, we overview current practice in the methods and discuss how they have had an impact in lead discovery – generating a large number of fragment-derived compounds that are in clinical trials and two medicines treating patients. In addition, we discuss some of the more recent applications of the methods in chemical biology – providing chemical tools to investigate biological molecules, mechanisms and systems. PMID:29118093

  18. A collaborative filtering-based approach to biomedical knowledge discovery.

    Science.gov (United States)

    Lever, Jake; Gakkhar, Sitanshu; Gottlieb, Michael; Rashnavadi, Tahereh; Lin, Santina; Siu, Celia; Smith, Maia; Jones, Martin R; Krzywinski, Martin; Jones, Steven J M; Wren, Jonathan

    2018-02-15

    The increase in publication rates makes it challenging for an individual researcher to stay abreast of all relevant research in order to find novel research hypotheses. Literature-based discovery methods make use of knowledge graphs built using text mining and can infer future associations between biomedical concepts that will likely occur in new publications. These predictions are a valuable resource for researchers to explore a research topic. Current methods for prediction are based on the local structure of the knowledge graph. A method that uses global knowledge from across the knowledge graph needs to be developed in order to make knowledge discovery a frequently used tool by researchers. We propose an approach based on the singular value decomposition (SVD) that is able to combine data from across the knowledge graph through a reduced representation. Using cooccurrence data extracted from published literature, we show that SVD performs better than the leading methods for scoring discoveries. We also show the diminishing predictive power of knowledge discovery as we compare our predictions with real associations that appear further into the future. Finally, we examine the strengths and weaknesses of the SVD approach against another well-performing system using several predicted associations. All code and results files for this analysis can be accessed at https://github.com/jakelever/knowledgediscovery. sjones@bcgsc.ca. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  19. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    Science.gov (United States)

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of EOperon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.

  20. [Fragment-based drug discovery: concept and aim].

    Science.gov (United States)

    Tanaka, Daisuke

    2010-03-01

    Fragment-Based Drug Discovery (FBDD) has been recognized as a newly emerging lead discovery methodology that involves biophysical fragment screening and chemistry-driven fragment-to-lead stages. Although fragments, defined as structurally simple and small compounds (typically FBDD primarily turns our attention to weakly but specifically binding fragments (hit fragments) as the starting point of medicinal chemistry. Hit fragments are then promoted to more potent lead compounds through linking or merging with another hit fragment and/or attaching functional groups. Another positive aspect of FBDD is ligand efficiency. Ligand efficiency is a useful guide in screening hit selection and hit-to-lead phases to achieve lead-likeness. Owing to these features, a number of successful applications of FBDD to "undruggable targets" (where HTS and other lead identification methods failed to identify useful lead compounds) have been reported. As a result, FBDD is now expected to complement more conventional methodologies. This review, as an introduction of the following articles, will summarize the fundamental concepts of FBDD and will discuss its advantages over other conventional drug discovery approaches.

  1. Evaluation of gene association methods for coexpression network construction and biological knowledge discovery.

    Directory of Open Access Journals (Sweden)

    Sapna Kumari

    Full Text Available BACKGROUND: Constructing coexpression networks and performing network analysis using large-scale gene expression data sets is an effective way to uncover new biological knowledge; however, the methods used for gene association in constructing these coexpression networks have not been thoroughly evaluated. Since different methods lead to structurally different coexpression networks and provide different information, selecting the optimal gene association method is critical. METHODS AND RESULTS: In this study, we compared eight gene association methods - Spearman rank correlation, Weighted Rank Correlation, Kendall, Hoeffding's D measure, Theil-Sen, Rank Theil-Sen, Distance Covariance, and Pearson - and focused on their true knowledge discovery rates in associating pathway genes and construction coordination networks of regulatory genes. We also examined the behaviors of different methods to microarray data with different properties, and whether the biological processes affect the efficiency of different methods. CONCLUSIONS: We found that the Spearman, Hoeffding and Kendall methods are effective in identifying coexpressed pathway genes, whereas the Theil-sen, Rank Theil-Sen, Spearman, and Weighted Rank methods perform well in identifying coordinated transcription factors that control the same biological processes and traits. Surprisingly, the widely used Pearson method is generally less efficient, and so is the Distance Covariance method that can find gene pairs of multiple relationships. Some analyses we did clearly show Pearson and Distance Covariance methods have distinct behaviors as compared to all other six methods. The efficiencies of different methods vary with the data properties to some degree and are largely contingent upon the biological processes, which necessitates the pre-analysis to identify the best performing method for gene association and coexpression network construction.

  2. Rediscovering Don Swanson: The Past, Present and Future of Literature-based Discovery

    Directory of Open Access Journals (Sweden)

    Neil R. Smalheiser

    2017-12-01

    Full Text Available Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. In this informal essay, I will give my personal perspective on Don’s contributions to science, and outline some current and future directions in literature-based discovery that are rooted in concepts that he developed. Design/methodology/approach: Personal recollections and literature review. Findings: The Swanson A-B-C model of literature-based discovery has been successfully used by laboratory investigators analyzing their findings and hypotheses. It continues to be a fertile area of research in a wide range of application areas including text mining, drug repurposing, studies of scientific innovation, knowledge discovery in databases, and bioinformatics. Recently, additional modes of discovery that do not follow the A-B-C model have also been proposed and explored (e.g. so-called storytelling, gaps, analogies, link prediction, negative consensus, outliers, and revival of neglected or discarded research questions. Research limitations: This paper reflects the opinions of the author and is not a comprehensive nor technically based review of literature-based discovery. Practical implications: The general scientific public is still not aware of the availability of tools for literature-based discovery. Our Arrowsmith project site maintains a suite of discovery tools that are free and open to the public (http://arrowsmith.psych.uic.edu, as does BITOLA which is maintained by Dmitar Hristovski (http:// http://ibmi.mf.uni-lj.si/bitola, and Epiphanet which is maintained by Trevor Cohen (http://epiphanet.uth.tmc.edu/. Bringing user-friendly tools to the public should be a high priority, since even more than advancing basic research in informatics, it is vital that we ensure that scientists

  3. Effectiveness of Discovery Learning-Based Transformation Geometry Module

    Science.gov (United States)

    Febriana, R.; Haryono, Y.; Yusri, R.

    2017-09-01

    Development of transformation geometry module is conducted because the students got difficulties to understand the existing book. The purpose of the research was to find out the effectiveness of discovery learning-based transformation geometry module toward student’s activity. Model of the development was Plomp model consisting preliminary research, prototyping phase and assessment phase. The research was focused on assessment phase where it was to observe the designed product effectiveness. The instrument was observation sheet. The observed activities were visual activities, oral activities, listening activities, mental activities, emotional activities and motor activities. Based on the result of the research, it is found that visual activities, learning activities, writing activities, the student’s activity is in the criteria very effective. It can be concluded that the use of discovery learning-based transformation geometry module use can increase the positive student’s activity and decrease the negative activity.

  4. Wide-Area Publish/Subscribe Mobile Resource Discovery Based on IPv6 GeoNetworking

    OpenAIRE

    Noguchi, Satoru; Matsuura, Satoshi; Inomata, Atsuo; Fujikawa, Kazutoshi; Sunahara, Hideki

    2013-01-01

    Resource discovery is an essential function for distributed mobile applications integrated in vehicular communication systems. Key requirements of the mobile resource discovery are wide-area geographic-based discovery and scalable resource discovery not only inside a vehicular ad-hoc network but also through the Internet. While a number of resource discovery solutions have been proposed, most of them have focused on specific scale of network. Furthermore, managing a large number of mobile res...

  5. Cultivation of hard-to-culture subsurface mercury-resistant bacteria and discovery of new merA gene sequences

    DEFF Research Database (Denmark)

    Rasmussen, L D; Zawadsky, C; Binnerup, S J

    2008-01-01

    different 16S rRNA gene sequences were observed, including Alpha-, Beta-, and Gammaproteobacteria; Actinobacteria; Firmicutes; and Bacteroidetes. The diversity of isolates obtained by direct plating included eight different 16S rRNA gene sequences (Alpha- and Betaproteobacteria and Actinobacteria). Partial...... sequencing of merA of selected isolates led to the discovery of new merA sequences. With phylum-specific merA primers, PCR products were obtained for Alpha- and Betaproteobacteria and Actinobacteria but not for Bacteroidetes and Firmicutes. The similarity to known sequences ranged between 89 and 95%. One...

  6. Weighted similarity-based clustering of chemical structures and bioactivity data in early drug discovery.

    Science.gov (United States)

    Perualila-Tan, Nolen Joy; Shkedy, Ziv; Talloen, Willem; Göhlmann, Hinrich W H; Moerbeke, Marijke Van; Kasim, Adetayo

    2016-08-01

    The modern process of discovering candidate molecules in early drug discovery phase includes a wide range of approaches to extract vital information from the intersection of biology and chemistry. A typical strategy in compound selection involves compound clustering based on chemical similarity to obtain representative chemically diverse compounds (not incorporating potency information). In this paper, we propose an integrative clustering approach that makes use of both biological (compound efficacy) and chemical (structural features) data sources for the purpose of discovering a subset of compounds with aligned structural and biological properties. The datasets are integrated at the similarity level by assigning complementary weights to produce a weighted similarity matrix, serving as a generic input in any clustering algorithm. This new analysis work flow is semi-supervised method since, after the determination of clusters, a secondary analysis is performed wherein it finds differentially expressed genes associated to the derived integrated cluster(s) to further explain the compound-induced biological effects inside the cell. In this paper, datasets from two drug development oncology projects are used to illustrate the usefulness of the weighted similarity-based clustering approach to integrate multi-source high-dimensional information to aid drug discovery. Compounds that are structurally and biologically similar to the reference compounds are discovered using this proposed integrative approach.

  7. Functional Gene Discovery and Characterization of Genes and Alleles Affecting Wood Biomass Yield and Quality in Populus

    Energy Technology Data Exchange (ETDEWEB)

    Busov, Victor [Michigan Technological Univ., Houghton, MI (United States)

    2017-02-12

    Adoption of biofuels as economically and environmentally viable alternative to fossil fuels would require development of specialized bioenergy varieties. A major goal in the breeding of such varieties is the improvement of lignocellulosic biomass yield and quality. These are complex traits and understanding the underpinning molecular mechanism can assist and accelerate their improvement. This is particularly important for tree bioenergy crops like poplars (species and hybrids from the genus Populus), for which breeding progress is extremely slow due to long generation cycles. A variety of approaches have been already undertaken to better understand the molecular bases of biomass yield and quality in poplar. An obvious void in these undertakings has been the application of mutagenesis. Mutagenesis has been instrumental in the discovery and characterization of many plant traits including such that affect biomass yield and quality. In this proposal we use activation tagging to discover genes that can significantly affect biomass associated traits directly in poplar, a premier bioenergy crop. We screened a population of 5,000 independent poplar activation tagging lines under greenhouse conditions for a battery of biomass yield traits. These same plants were then analyzed for changes in wood chemistry using pyMBMS. As a result of these screens we have identified nearly 800 mutants, which are significantly (P<0.05) different when compared to wild type. Of these majority (~700) are affected in one of ten different biomass yield traits and 100 in biomass quality traits (e.g., lignin, S/G ration and C6/C5 sugars). We successfully recovered the position of the tag in approximately 130 lines, showed activation in nearly half of them and performed recapitulation experiments with 20 genes prioritized by the significance of the phenotype. Recapitulation experiments are still ongoing for many of the genes but the results are encouraging. For example, we have shown successful

  8. Influence networks based on coexpression improve drug target discovery for the development of novel cancer therapeutics

    Science.gov (United States)

    2014-01-01

    Background The demand for novel molecularly targeted drugs will continue to rise as we move forward toward the goal of personalizing cancer treatment to the molecular signature of individual tumors. However, the identification of targets and combinations of targets that can be safely and effectively modulated is one of the greatest challenges facing the drug discovery process. A promising approach is to use biological networks to prioritize targets based on their relative positions to one another, a property that affects their ability to maintain network integrity and propagate information-flow. Here, we introduce influence networks and demonstrate how they can be used to generate influence scores as a network-based metric to rank genes as potential drug targets. Results We use this approach to prioritize genes as drug target candidates in a set of ER + breast tumor samples collected during the course of neoadjuvant treatment with the aromatase inhibitor letrozole. We show that influential genes, those with high influence scores, tend to be essential and include a higher proportion of essential genes than those prioritized based on their position (i.e. hubs or bottlenecks) within the same network. Additionally, we show that influential genes represent novel biologically relevant drug targets for the treatment of ER + breast cancers. Moreover, we demonstrate that gene influence differs between untreated tumors and residual tumors that have adapted to drug treatment. In this way, influence scores capture the context-dependent functions of genes and present the opportunity to design combination treatment strategies that take advantage of the tumor adaptation process. Conclusions Influence networks efficiently find essential genes as promising drug targets and combinations of targets to inform the development of molecularly targeted drugs and their use. PMID:24495353

  9. TargetMine, an integrated data warehouse for candidate gene prioritisation and target discovery.

    Directory of Open Access Journals (Sweden)

    Yi-An Chen

    Full Text Available Prioritising candidate genes for further experimental characterisation is a non-trivial challenge in drug discovery and biomedical research in general. An integrated approach that combines results from multiple data types is best suited for optimal target selection. We developed TargetMine, a data warehouse for efficient target prioritisation. TargetMine utilises the InterMine framework, with new data models such as protein-DNA interactions integrated in a novel way. It enables complicated searches that are difficult to perform with existing tools and it also offers integration of custom annotations and in-house experimental data. We proposed an objective protocol for target prioritisation using TargetMine and set up a benchmarking procedure to evaluate its performance. The results show that the protocol can identify known disease-associated genes with high precision and coverage. A demonstration version of TargetMine is available at http://targetmine.nibio.go.jp/.

  10. Genome-wide target profiling of piggyBac and Tol2 in HEK 293: pros and cons for gene discovery and gene therapy

    Science.gov (United States)

    2011-01-01

    Background DNA transposons have emerged as indispensible tools for manipulating vertebrate genomes with applications ranging from insertional mutagenesis and transgenesis to gene therapy. To fully explore the potential of two highly active DNA transposons, piggyBac and Tol2, as mammalian genetic tools, we have conducted a side-by-side comparison of the two transposon systems in the same setting to evaluate their advantages and disadvantages for use in gene therapy and gene discovery. Results We have observed that (1) the Tol2 transposase (but not piggyBac) is highly sensitive to molecular engineering; (2) the piggyBac donor with only the 40 bp 3'-and 67 bp 5'-terminal repeat domain is sufficient for effective transposition; and (3) a small amount of piggyBac transposases results in robust transposition suggesting the piggyBac transpospase is highly active. Performing genome-wide target profiling on data sets obtained by retrieving chromosomal targeting sequences from individual clones, we have identified several piggyBac and Tol2 hotspots and observed that (4) piggyBac and Tol2 display a clear difference in targeting preferences in the human genome. Finally, we have observed that (5) only sites with a particular sequence context can be targeted by either piggyBac or Tol2. Conclusions The non-overlapping targeting preference of piggyBac and Tol2 makes them complementary research tools for manipulating mammalian genomes. PiggyBac is the most promising transposon-based vector system for achieving site-specific targeting of therapeutic genes due to the flexibility of its transposase for being molecularly engineered. Insights from this study will provide a basis for engineering piggyBac transposases to achieve site-specific therapeutic gene targeting. PMID:21447194

  11. Mining disease genes using integrated protein-protein interaction and gene-gene co-regulation information.

    Science.gov (United States)

    Li, Jin; Wang, Limei; Guo, Maozu; Zhang, Ruijie; Dai, Qiguo; Liu, Xiaoyan; Wang, Chunyu; Teng, Zhixia; Xuan, Ping; Zhang, Mingming

    2015-01-01

    In humans, despite the rapid increase in disease-associated gene discovery, a large proportion of disease-associated genes are still unknown. Many network-based approaches have been used to prioritize disease genes. Many networks, such as the protein-protein interaction (PPI), KEGG, and gene co-expression networks, have been used. Expression quantitative trait loci (eQTLs) have been successfully applied for the determination of genes associated with several diseases. In this study, we constructed an eQTL-based gene-gene co-regulation network (GGCRN) and used it to mine for disease genes. We adopted the random walk with restart (RWR) algorithm to mine for genes associated with Alzheimer disease. Compared to the Human Protein Reference Database (HPRD) PPI network alone, the integrated HPRD PPI and GGCRN networks provided faster convergence and revealed new disease-related genes. Therefore, using the RWR algorithm for integrated PPI and GGCRN is an effective method for disease-associated gene mining.

  12. CLARM: An integrative approach for functional modules discovery

    KAUST Repository

    Salem, Saeed M.; Alroobi, Rami; Banitaan, Shadi; Seridi, Loqmane; Brewer, James E.; Aljarah, Ibrahim

    2011-01-01

    Functional module discovery aims to find well-connected subnetworks which can serve as candidate protein complexes. Advances in High-throughput proteomic technologies have enabled the collection of large amount of interaction data as well as gene expression data. We propose, CLARM, a clustering algorithm that integrates gene expression profiles and protein protein interaction network for biological modules discovery. The main premise is that by enriching the interaction network by adding interactions between genes which are highly co-expressed over a wide range of biological and environmental conditions, we can improve the quality of the discovered modules. Protein protein interactions, known protein complexes, and gene expression profiles for diverse environmental conditions from the yeast Saccharomyces cerevisiae were used for evaluate the biological significance of the reported modules. Our experiments show that the CLARM approach is competitive to wellestablished module discovery methods. Copyright © 2011 ACM.

  13. Using the TIGR gene index databases for biological discovery.

    Science.gov (United States)

    Lee, Yuandan; Quackenbush, John

    2003-11-01

    The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.

  14. Bead-based screening in chemical biology and drug discovery

    DEFF Research Database (Denmark)

    Komnatnyy, Vitaly V.; Nielsen, Thomas Eiland; Qvortrup, Katrine

    2018-01-01

    libraries for early drug discovery. Among the various library forms, the one-bead-one-compound (OBOC) library, where each bead carries many copies of a single compound, holds the greatest potential for the rapid identification of novel hits against emerging drug targets. However, this potential has not yet...... been fully realized due to a number of technical obstacles. In this feature article, we review the progress that has been made towards bead-based library screening and applications to the discovery of bioactive compounds. We identify the key challenges of this approach and highlight key steps needed......High-throughput screening is an important component of the drug discovery process. The screening of libraries containing hundreds of thousands of compounds requires assays amanable to miniaturisation and automization. Combinatorial chemistry holds a unique promise to deliver structural diverse...

  15. Fragment-based drug discovery and protein–protein interactions

    Directory of Open Access Journals (Sweden)

    Turnbull AP

    2014-09-01

    Full Text Available Andrew P Turnbull,1 Susan M Boyd,2 Björn Walse31CRT Discovery Laboratories, Department of Biological Sciences, Birkbeck, University of London, London, UK; 2IOTA Pharmaceuticals Ltd, Cambridge, UK; 3SARomics Biostructures AB, Lund, SwedenAbstract: Protein–protein interactions (PPIs are involved in many biological processes, with an estimated 400,000 PPIs within the human proteome. There is significant interest in exploiting the relatively unexplored potential of these interactions in drug discovery, driven by the need to find new therapeutic targets. Compared with classical drug discovery against targets with well-defined binding sites, developing small-molecule inhibitors against PPIs where the contact surfaces are frequently more extensive and comparatively flat, with most of the binding energy localized in “hot spots”, has proven far more challenging. However, despite the difficulties associated with targeting PPIs, important progress has been made in recent years with fragment-based drug discovery playing a pivotal role in improving their tractability. Computational and empirical approaches can be used to identify hot-spot regions and assess the druggability and ligandability of new targets, whilst fragment screening campaigns can detect low-affinity fragments that either directly or indirectly perturb the PPI. Once fragment hits have been identified and confirmed using biochemical and biophysical approaches, three-dimensional structural data derived from nuclear magnetic resonance or X-ray crystallography can be used to drive medicinal chemistry efforts towards the development of more potent inhibitors. A small-scale comparison presented in this review of “standard” fragments with those targeting PPIs has revealed that the latter tend to be larger, be more lipophilic, and contain more polar (acid/base functionality, whereas three-dimensional descriptor data indicate that there is little difference in their three

  16. Empirical study of supervised gene screening

    Directory of Open Access Journals (Sweden)

    Ma Shuangge

    2006-12-01

    Full Text Available Abstract Background Microarray studies provide a way of linking variations of phenotypes with their genetic causations. Constructing predictive models using high dimensional microarray measurements usually consists of three steps: (1 unsupervised gene screening; (2 supervised gene screening; and (3 statistical model building. Supervised gene screening based on marginal gene ranking is commonly used to reduce the number of genes in the model building. Various simple statistics, such as t-statistic or signal to noise ratio, have been used to rank genes in the supervised screening. Despite of its extensive usage, statistical study of supervised gene screening remains scarce. Our study is partly motivated by the differences in gene discovery results caused by using different supervised gene screening methods. Results We investigate concordance and reproducibility of supervised gene screening based on eight commonly used marginal statistics. Concordance is assessed by the relative fractions of overlaps between top ranked genes screened using different marginal statistics. We propose a Bootstrap Reproducibility Index, which measures reproducibility of individual genes under the supervised screening. Empirical studies are based on four public microarray data. We consider the cases where the top 20%, 40% and 60% genes are screened. Conclusion From a gene discovery point of view, the effect of supervised gene screening based on different marginal statistics cannot be ignored. Empirical studies show that (1 genes passed different supervised screenings may be considerably different; (2 concordance may vary, depending on the underlying data structure and percentage of selected genes; (3 evaluated with the Bootstrap Reproducibility Index, genes passed supervised screenings are only moderately reproducible; and (4 concordance cannot be improved by supervised screening based on reproducibility.

  17. Topology Discovery Using Cisco Discovery Protocol

    OpenAIRE

    Rodriguez, Sergio R.

    2009-01-01

    In this paper we address the problem of discovering network topology in proprietary networks. Namely, we investigate topology discovery in Cisco-based networks. Cisco devices run Cisco Discovery Protocol (CDP) which holds information about these devices. We first compare properties of topologies that can be obtained from networks deploying CDP versus Spanning Tree Protocol (STP) and Management Information Base (MIB) Forwarding Database (FDB). Then we describe a method of discovering topology ...

  18. Handling Neighbor Discovery and Rendezvous Consistency with Weighted Quorum-Based Approach.

    Science.gov (United States)

    Own, Chung-Ming; Meng, Zhaopeng; Liu, Kehan

    2015-09-03

    Neighbor discovery and the power of sensors play an important role in the formation of Wireless Sensor Networks (WSNs) and mobile networks. Many asynchronous protocols based on wake-up time scheduling have been proposed to enable neighbor discovery among neighboring nodes for the energy saving, especially in the difficulty of clock synchronization. However, existing researches are divided two parts with the neighbor-discovery methods, one is the quorum-based protocols and the other is co-primality based protocols. Their distinction is on the arrangements of time slots, the former uses the quorums in the matrix, the latter adopts the numerical analysis. In our study, we propose the weighted heuristic quorum system (WQS), which is based on the quorum algorithm to eliminate redundant paths of active slots. We demonstrate the specification of our system: fewer active slots are required, the referring rate is balanced, and remaining power is considered particularly when a device maintains rendezvous with discovered neighbors. The evaluation results showed that our proposed method can effectively reschedule the active slots and save the computing time of the network system.

  19. Handling Neighbor Discovery and Rendezvous Consistency with Weighted Quorum-Based Approach

    Directory of Open Access Journals (Sweden)

    Chung-Ming Own

    2015-09-01

    Full Text Available Neighbor discovery and the power of sensors play an important role in the formation of Wireless Sensor Networks (WSNs and mobile networks. Many asynchronous protocols based on wake-up time scheduling have been proposed to enable neighbor discovery among neighboring nodes for the energy saving, especially in the difficulty of clock synchronization. However, existing researches are divided two parts with the neighbor-discovery methods, one is the quorum-based protocols and the other is co-primality based protocols. Their distinction is on the arrangements of time slots, the former uses the quorums in the matrix, the latter adopts the numerical analysis. In our study, we propose the weighted heuristic quorum system (WQS, which is based on the quorum algorithm to eliminate redundant paths of active slots. We demonstrate the specification of our system: fewer active slots are required, the referring rate is balanced, and remaining power is considered particularly when a device maintains rendezvous with discovered neighbors. The evaluation results showed that our proposed method can effectively reschedule the active slots and save the computing time of the network system.

  20. Medicinal chemistry inspired fragment-based drug discovery.

    Science.gov (United States)

    Lanter, James; Zhang, Xuqing; Sui, Zhihua

    2011-01-01

    Lead generation can be a very challenging phase of the drug discovery process. The two principal methods for this stage of research are blind screening and rational design. Among the rational or semirational design approaches, fragment-based drug discovery (FBDD) has emerged as a useful tool for the generation of lead structures. It is particularly powerful as a complement to high-throughput screening approaches when the latter failed to yield viable hits for further development. Engagement of medicinal chemists early in the process can accelerate the progression of FBDD efforts by incorporating drug-friendly properties in the earliest stages of the design process. Medium-chain acyl-CoA synthetase 2b and ketohexokinase are chosen as examples to illustrate the importance of close collaboration of medicinal chemists, crystallography, and modeling. Copyright © 2011 Elsevier Inc. All rights reserved.

  1. Comparative GO: a web application for comparative gene ontology and gene ontology-based gene selection in bacteria.

    Directory of Open Access Journals (Sweden)

    Mario Fruzangohar

    Full Text Available The primary means of classifying new functions for genes and proteins relies on Gene Ontology (GO, which defines genes/proteins using a controlled vocabulary in terms of their Molecular Function, Biological Process and Cellular Component. The challenge is to present this information to researchers to compare and discover patterns in multiple datasets using visually comprehensible and user-friendly statistical reports. Importantly, while there are many GO resources available for eukaryotes, there are none suitable for simultaneous, graphical and statistical comparison between multiple datasets. In addition, none of them supports comprehensive resources for bacteria. By using Streptococcus pneumoniae as a model, we identified and collected GO resources including genes, proteins, taxonomy and GO relationships from NCBI, UniProt and GO organisations. Then, we designed database tables in PostgreSQL database server and developed a Java application to extract data from source files and loaded into database automatically. We developed a PHP web application based on Model-View-Control architecture, used a specific data structure as well as current and novel algorithms to estimate GO graphs parameters. We designed different navigation and visualization methods on the graphs and integrated these into graphical reports. This tool is particularly significant when comparing GO groups between multiple samples (including those of pathogenic bacteria from different sources simultaneously. Comparing GO protein distribution among up- or down-regulated genes from different samples can improve understanding of biological pathways, and mechanism(s of infection. It can also aid in the discovery of genes associated with specific function(s for investigation as a novel vaccine or therapeutic targets.http://turing.ersa.edu.au/BacteriaGO.

  2. SemaTyP: a knowledge graph based literature mining method for drug discovery.

    Science.gov (United States)

    Sang, Shengtian; Yang, Zhihao; Wang, Lei; Liu, Xiaoxia; Lin, Hongfei; Wang, Jian

    2018-05-30

    Drug discovery is the process through which potential new medicines are identified. High-throughput screening and computer-aided drug discovery/design are the two main drug discovery methods for now, which have successfully discovered a series of drugs. However, development of new drugs is still an extremely time-consuming and expensive process. Biomedical literature contains important clues for the identification of potential treatments. It could support experts in biomedicine on their way towards new discoveries. Here, we propose a biomedical knowledge graph-based drug discovery method called SemaTyP, which discovers candidate drugs for diseases by mining published biomedical literature. We first construct a biomedical knowledge graph with the relations extracted from biomedical abstracts, then a logistic regression model is trained by learning the semantic types of paths of known drug therapies' existing in the biomedical knowledge graph, finally the learned model is used to discover drug therapies for new diseases. The experimental results show that our method could not only effectively discover new drug therapies for new diseases, but also could provide the potential mechanism of action of the candidate drugs. In this paper we propose a novel knowledge graph based literature mining method for drug discovery. It could be a supplementary method for current drug discovery methods.

  3. Discovery of pyridine-based agrochemicals by using Intermediate Derivatization Methods.

    Science.gov (United States)

    Guan, Ai-Ying; Liu, Chang-Ling; Sun, Xu-Feng; Xie, Yong; Wang, Ming-An

    2016-02-01

    Pyridine-based compounds have been playing a crucial role as agrochemicals or pesticides including fungicides, insecticides/acaricides and herbicides, etc. Since most of the agrochemicals listed in the Pesticide Manual were discovered through screening programs that relied on trial-and-error testing and new agrochemical discovery is not benefiting as much from the in silico new chemical compound identification/discovery techniques used in pharmaceutical research, it has become more important to find new methods to enhance the efficiency of discovering novel lead compounds in the agrochemical field to shorten the time of research phases in order to meet changing market requirements. In this review, we selected 18 representative known agrochemicals containing a pyridine moiety and extrapolate their discovery from the perspective of Intermediate Derivatization Methods in the hope that this approach will have greater appeal to researchers engaged in the discovery of agrochemicals and/or pharmaceuticals. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Reconstructing Sessions from Data Discovery and Access Logs to Build a Semantic Knowledge Base for Improving Data Discovery

    Directory of Open Access Journals (Sweden)

    Yongyao Jiang

    2016-04-01

    Full Text Available Big geospatial data are archived and made available through online web discovery and access. However, finding the right data for scientific research and application development is still a challenge. This paper aims to improve the data discovery by mining the user knowledge from log files. Specifically, user web session reconstruction is focused upon in this paper as a critical step for extracting usage patterns. However, reconstructing user sessions from raw web logs has always been difficult, as a session identifier tends to be missing in most data portals. To address this problem, we propose two session identification methods, including time-clustering-based and time-referrer-based methods. We also present the workflow of session reconstruction and discuss the approach of selecting appropriate thresholds for relevant steps in the workflow. The proposed session identification methods and workflow are proven to be able to extract data access patterns for further pattern analyses of user behavior and improvement of data discovery for more relevancy data ranking, suggestion, and navigation.

  5. Generation of cell lines for drug discovery through random activation of gene expression: application to the human histamine H3 receptor.

    Science.gov (United States)

    Song, J; Doucette, C; Hanniford, D; Hunady, K; Wang, N; Sherf, B; Harrington, J J; Brunden, K R; Stricker-Krongrad, A

    2005-06-01

    Target-based high-throughput screening (HTS) plays an integral role in drug discovery. The implementation of HTS assays generally requires high expression levels of the target protein, and this is typically accomplished using recombinant cDNA methodologies. However, the isolated gene sequences to many drug targets have intellectual property claims that restrict the ability to implement drug discovery programs. The present study describes the pharmacological characterization of the human histamine H3 receptor that was expressed using random activation of gene expression (RAGE), a technology that over-expresses proteins by up-regulating endogenous genes rather than introducing cDNA expression vectors into the cell. Saturation binding analysis using [125I]iodoproxyfan and RAGE-H3 membranes revealed a single class of binding sites with a K(D) value of 0.77 nM and a B(max) equal to 756 fmol/mg of protein. Competition binding studies showed that the rank order of potency for H3 agonists was N(alpha)-methylhistamine approximately (R)-alpha- methylhistamine > histamine and that the rank order of potency for H3 antagonists was clobenpropit > iodophenpropit > thioperamide. The same rank order of potency for H3 agonists and antagonists was observed in the functional assays as in the binding assays. The Fluorometic Imaging Plate Reader assays in RAGE-H3 cells gave high Z' values for agonist and antagonist screening, respectively. These results reveal that the human H3 receptor expressed with the RAGE technology is pharmacologically comparable to that expressed through recombinant methods. Moreover, the level of expression of the H3 receptor in the RAGE-H3 cells is suitable for HTS and secondary assays.

  6. Text mining-based in silico drug discovery in oral mucositis caused by high-dose cancer therapy.

    Science.gov (United States)

    Kirk, Jon; Shah, Nirav; Noll, Braxton; Stevens, Craig B; Lawler, Marshall; Mougeot, Farah B; Mougeot, Jean-Luc C

    2018-08-01

    Oral mucositis (OM) is a major dose-limiting side effect of chemotherapy and radiation used in cancer treatment. Due to the complex nature of OM, currently available drug-based treatments are of limited efficacy. Our objectives were (i) to determine genes and molecular pathways associated with OM and wound healing using computational tools and publicly available data and (ii) to identify drugs formulated for topical use targeting the relevant OM molecular pathways. OM and wound healing-associated genes were determined by text mining, and the intersection of the two gene sets was selected for gene ontology analysis using the GeneCodis program. Protein interaction network analysis was performed using STRING-db. Enriched gene sets belonging to the identified pathways were queried against the Drug-Gene Interaction database to find drug candidates for topical use in OM. Our analysis identified 447 genes common to both the "OM" and "wound healing" text mining concepts. Gene enrichment analysis yielded 20 genes representing six pathways and targetable by a total of 32 drugs which could possibly be formulated for topical application. A manual search on ClinicalTrials.gov confirmed no relevant pathway/drug candidate had been overlooked. Twenty-five of the 32 drugs can directly affect the PTGS2 (COX-2) pathway, the pathway that has been targeted in previous clinical trials with limited success. Drug discovery using in silico text mining and pathway analysis tools can facilitate the identification of existing drugs that have the potential of topical administration to improve OM treatment.

  7. A P2P Service Discovery Strategy Based on Content Catalogues

    Directory of Open Access Journals (Sweden)

    Lican Huang

    2007-08-01

    Full Text Available This paper presents a framework for distributed service discovery based on VIRGO P2P technologies. The services are classified as multi-layer, hierarchical catalogue domains according to their contents. The service providers, which have their own service registries such as UDDIs, register the services they provide and establish a virtual tree in a VIRGO network according to the domain of their service. The service location done by the proposed strategy is effective and guaranteed. This paper also discusses the primary implementation of service discovery based on Tomcat/Axis and jUDDI.

  8. Improving sensitivity of linear regression-based cell type-specific differential expression deconvolution with per-gene vs. global significance threshold.

    Science.gov (United States)

    Glass, Edmund R; Dozmorov, Mikhail G

    2016-10-06

    The goal of many human disease-oriented studies is to detect molecular mechanisms different between healthy controls and patients. Yet, commonly used gene expression measurements from blood samples suffer from variability of cell composition. This variability hinders the detection of differentially expressed genes and is often ignored. Combined with cell counts, heterogeneous gene expression may provide deeper insights into the gene expression differences on the cell type-specific level. Published computational methods use linear regression to estimate cell type-specific differential expression, and a global cutoff to judge significance, such as False Discovery Rate (FDR). Yet, they do not consider many artifacts hidden in high-dimensional gene expression data that may negatively affect linear regression. In this paper we quantify the parameter space affecting the performance of linear regression (sensitivity of cell type-specific differential expression detection) on a per-gene basis. We evaluated the effect of sample sizes, cell type-specific proportion variability, and mean squared error on sensitivity of cell type-specific differential expression detection using linear regression. Each parameter affected variability of cell type-specific expression estimates and, subsequently, the sensitivity of differential expression detection. We provide the R package, LRCDE, which performs linear regression-based cell type-specific differential expression (deconvolution) detection on a gene-by-gene basis. Accounting for variability around cell type-specific gene expression estimates, it computes per-gene t-statistics of differential detection, p-values, t-statistic-based sensitivity, group-specific mean squared error, and several gene-specific diagnostic metrics. The sensitivity of linear regression-based cell type-specific differential expression detection differed for each gene as a function of mean squared error, per group sample sizes, and variability of the proportions

  9. Generation of comprehensive transposon insertion mutant library for the model archaeon, Haloferax volcanii, and its use for gene discovery.

    Science.gov (United States)

    Kiljunen, Saija; Pajunen, Maria I; Dilks, Kieran; Storf, Stefanie; Pohlschroder, Mechthild; Savilahti, Harri

    2014-12-09

    Archaea share fundamental properties with bacteria and eukaryotes. Yet, they also possess unique attributes, which largely remain poorly characterized. Haloferax volcanii is an aerobic, moderately halophilic archaeon that can be grown in defined media. It serves as an excellent archaeal model organism to study the molecular mechanisms of biological processes and cellular responses to changes in the environment. Studies on haloarchaea have been impeded by the lack of efficient genetic screens that would facilitate the identification of protein functions and respective metabolic pathways. Here, we devised an insertion mutagenesis strategy that combined Mu in vitro DNA transposition and homologous-recombination-based gene targeting in H. volcanii. We generated an insertion mutant library, in which the clones contained a single genomic insertion. From the library, we isolated pigmentation-defective and auxotrophic mutants, and the respective insertions pinpointed a number of genes previously known to be involved in carotenoid and amino acid biosynthesis pathways, thus validating the performance of the methodologies used. We also identified mutants that had a transposon insertion in a gene encoding a protein of unknown or putative function, demonstrating that novel roles for non-annotated genes could be assigned. We have generated, for the first time, a random genomic insertion mutant library for a halophilic archaeon and used it for efficient gene discovery. The library will facilitate the identification of non-essential genes behind any specific biochemical pathway. It represents a significant step towards achieving a more complete understanding of the unique characteristics of halophilic archaea.

  10. Discovery of potent, reversible MetAP2 inhibitors via fragment based drug discovery and structure based drug design-Part 2.

    Science.gov (United States)

    McBride, Christopher; Cheruvallath, Zacharia; Komandla, Mallareddy; Tang, Mingnam; Farrell, Pamela; Lawson, J David; Vanderpool, Darin; Wu, Yiqin; Dougan, Douglas R; Plonowski, Artur; Holub, Corine; Larson, Chris

    2016-06-15

    Methionine aminopeptidase-2 (MetAP2) is an enzyme that cleaves an N-terminal methionine residue from a number of newly synthesized proteins. This step is required before they will fold or function correctly. Pre-clinical and clinical studies with a MetAP2 inhibitor suggest that they could be used as a novel treatment for obesity. Herein we describe the discovery of a series of pyrazolo[4,3-b]indoles as reversible MetAP2 inhibitors. A fragment-based drug discovery (FBDD) approach was used, beginning with the screening of fragment libraries to generate hits with high ligand-efficiency (LE). An indazole core was selected for further elaboration, guided by structural information. SAR from the indazole series led to the design of a pyrazolo[4,3-b]indole core and accelerated knowledge-based fragment growth resulted in potent and efficient MetAP2 inhibitors, which have shown robust and sustainable body weight loss in DIO mice when dosed orally. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Advances in fragment-based drug discovery platforms.

    Science.gov (United States)

    Orita, Masaya; Warizaya, Masaichi; Amano, Yasushi; Ohno, Kazuki; Niimi, Tatsuya

    2009-11-01

    Fragment-based drug discovery (FBDD) has been established as a powerful alternative and complement to traditional high-throughput screening techniques for identifying drug leads. At present, this technique is widely used among academic groups as well as small biotech and large pharmaceutical companies. In recent years, > 10 new compounds developed with FBDD have entered clinical development, and more and more attention in the drug discovery field is being focused on this technique. Under the FBDD approach, a fragment library of relatively small compounds (molecular mass = 100 - 300 Da) is screened by various methods and the identified fragment hits which normally weakly bind to the target are used as starting points to generate more potent drug leads. Because FBDD is still a relatively new drug discovery technology, further developments and optimizations in screening platforms and fragment exploitation can be expected. This review summarizes recent advances in FBDD platforms and discusses the factors important for the successful application of this technique. Under the FBDD approach, both identifying the starting fragment hit to be developed and generating the drug lead from that starting fragment hit are important. Integration of various techniques, such as computational technology, X-ray crystallography, NMR, surface plasmon resonance, isothermal titration calorimetry, mass spectrometry and high-concentration screening, must be applied in a situation-appropriate manner.

  12. Gene discovery in Triatoma infestans

    Directory of Open Access Journals (Sweden)

    de Burgos Nelia

    2011-03-01

    Full Text Available Abstract Background Triatoma infestans is the most relevant vector of Chagas disease in the southern cone of South America. Since its genome has not yet been studied, sequencing of Expressed Sequence Tags (ESTs is one of the most powerful tools for efficiently identifying large numbers of expressed genes in this insect vector. Results In this work, we generated 826 ESTs, resulting in an increase of 47% in the number of ESTs available for T. infestans. These ESTs were assembled in 471 unique sequences, 151 of which represent 136 new genes for the Reduviidae family. Conclusions Among the putative new genes for the Reduviidae family, we identified and described an interesting subset of genes involved in development and reproduction, which constitute potential targets for insecticide development.

  13. Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems

    Directory of Open Access Journals (Sweden)

    Chung I-Fang

    2008-10-01

    Full Text Available Abstract Background The Signal-to-Noise-Ratio (SNR is often used for identification of biomarkers for two-class problems and no formal and useful generalization of SNR is available for multiclass problems. We propose innovative generalizations of SNR for multiclass cancer discrimination through introduction of two indices, Gene Dominant Index and Gene Dormant Index (GDIs. These two indices lead to the concepts of dominant and dormant genes with biological significance. We use these indices to develop methodologies for discovery of dominant and dormant biomarkers with interesting biological significance. The dominancy and dormancy of the identified biomarkers and their excellent discriminating power are also demonstrated pictorially using the scatterplot of individual gene and 2-D Sammon's projection of the selected set of genes. Using information from the literature we have shown that the GDI based method can identify dominant and dormant genes that play significant roles in cancer biology. These biomarkers are also used to design diagnostic prediction systems. Results and discussion To evaluate the effectiveness of the GDIs, we have used four multiclass cancer data sets (Small Round Blue Cell Tumors, Leukemia, Central Nervous System Tumors, and Lung Cancer. For each data set we demonstrate that the new indices can find biologically meaningful genes that can act as biomarkers. We then use six machine learning tools, Nearest Neighbor Classifier (NNC, Nearest Mean Classifier (NMC, Support Vector Machine (SVM classifier with linear kernel, and SVM classifier with Gaussian kernel, where both SVMs are used in conjunction with one-vs-all (OVA and one-vs-one (OVO strategies. We found GDIs to be very effective in identifying biomarkers with strong class specific signatures. With all six tools and for all data sets we could achieve better or comparable prediction accuracies usually with fewer marker genes than results reported in the literature using the

  14. Network-Based Method for Identifying Co- Regeneration Genes in Bone, Dentin, Nerve and Vessel Tissues.

    Science.gov (United States)

    Chen, Lei; Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Huang, Tao; Cai, Yu-Dong

    2017-10-02

    Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein-protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method.

  15. Fragment-based drug discovery and molecular docking in drug design.

    Science.gov (United States)

    Wang, Tao; Wu, Mian-Bin; Chen, Zheng-Jie; Chen, Hua; Lin, Jian-Ping; Yang, Li-Rong

    2015-01-01

    Fragment-based drug discovery (FBDD) has caused a revolution in the process of drug discovery and design, with many FBDD leads being developed into clinical trials or approved in the past few years. Compared with traditional high-throughput screening, it displays obvious advantages such as efficiently covering chemical space, achieving higher hit rates, and so forth. In this review, we focus on the most recent developments of FBDD for improving drug discovery, illustrating the process and the importance of FBDD. In particular, the computational strategies applied in the process of FBDD and molecular-docking programs are highlighted elaborately. In most cases, docking is used for predicting the ligand-receptor interaction modes and hit identification by structurebased virtual screening. The successful cases of typical significance and the hits identified most recently are discussed.

  16. Single-gene prognostic signatures for advanced stage serous ovarian cancer based on 1257 patient samples.

    Science.gov (United States)

    Zhang, Fan; Yang, Kai; Deng, Kui; Zhang, Yuanyuan; Zhao, Weiwei; Xu, Huan; Rong, Zhiwei; Li, Kang

    2018-04-16

    We sought to identify stable single-gene prognostic signatures based on a large collection of advanced stage serous ovarian cancer (AS-OvCa) gene expression data and explore their functions. The empirical Bayes (EB) method was used to remove the batch effect and integrate 8 ovarian cancer datasets. Univariate Cox regression was used to evaluate the association between gene and overall survival (OS). The Database for Annotation, Visualization and Integrated Discovery (DAVID) tool was used for the functional annotation of genes for Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The batch effect was removed by the EB method, and 1257 patient samples were used for further analysis. We selected 341 single-gene prognostic signatures with FDR matrix organization, focal adhesion and DNA replication which are closely associated with cancer. We used the EB method to remove the batch effect of 8 datasets, integrated these datasets and identified stable prognosis signatures for AS-OvCa.

  17. Knowledge-Based Topic Model for Unsupervised Object Discovery and Localization.

    Science.gov (United States)

    Niu, Zhenxing; Hua, Gang; Wang, Le; Gao, Xinbo

    Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object

  18. Mass Spectrometry–Based Biomarker Discovery: Toward a Global Proteome Index of Individuality

    Science.gov (United States)

    Hawkridge, Adam M.; Muddiman, David C.

    2011-01-01

    Biomarker discovery and proteomics have become synonymous with mass spectrometry in recent years. Although this conflation is an injustice to the many essential biomolecular techniques widely used in biomarker-discovery platforms, it underscores the power and potential of contemporary mass spectrometry. Numerous novel and powerful technologies have been developed around mass spectrometry, proteomics, and biomarker discovery over the past 20 years to globally study complex proteomes (e.g., plasma). However, very few large-scale longitudinal studies have been carried out using these platforms to establish the analytical variability relative to true biological variability. The purpose of this review is not to cover exhaustively the applications of mass spectrometry to biomarker discovery, but rather to discuss the analytical methods and strategies that have been developed for mass spectrometry–based biomarker-discovery platforms and to place them in the context of the many challenges and opportunities yet to be addressed. PMID:20636062

  19. Repurposed transcriptomic data facilitate discovery of innate immunity toll-like receptor (TLR) Genes across Lophotrochozoa.

    Science.gov (United States)

    Halanych, Kenneth M; Kocot, Kevin M

    2014-10-01

    The growing volume of genomic data from across life represents opportunities for deriving valuable biological information from data that were initially collected for another purpose. Here, we use transcriptomes collected for phylogenomic studies to search for toll-like receptor (TLR) genes in poorly sampled lophotrochozoan clades (Annelida, Mollusca, Brachiopoda, Phoronida, and Entoprocta) and one ecdysozoan clade (Priapulida). TLR genes are involved in innate immunity across animals by recognizing potential microbial infection. They have an extracellular leucine-rich repeat (LRR) domain connected to a transmembrane domain and an intracellular toll/interleukin-1 receptor (TIR) domain. Consequently, these genes are important in initiating a signaling pathway to trigger defense. We found at least one TLR ortholog in all but two taxa examined, suggesting that a broad array of lophotrochozoans may have innate immune systems similar to those observed in vertebrates and arthropods. Comparison to the SMART database confirmed the presence of both the LRR and the TIR protein motifs characteristic of TLR genes. Because we looked at only one transcriptome per species, discovery of TLR genes was limited for most taxa. However, several TRL-like genes that vary in the number and placement of LRR domains were found in phoronids. Additionally, several contigs contained LRR domains but lacked TIR domains, suggesting they were not TLRs. Many of these LRR-containing contigs had other domains (e.g., immunoglobin) and are likely involved in innate immunity. © 2014 Marine Biological Laboratory.

  20. Gene Discovery through Genomic Sequencing of Brucella abortus

    Science.gov (United States)

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979

  1. promoting self directed learning in simulation based discovery learning environments through intelligent support.

    NARCIS (Netherlands)

    Veermans, K.H.; de Jong, Anthonius J.M.; van Joolingen, Wouter

    2000-01-01

    Providing learners with computer-generated feedback on their learning process in simulationbased discovery environments cannot be based on a detailed model of the learning process due to the “open” character of discovery learning. This paper describes a method for generating adaptive feedback for

  2. Helping Students Understand Gene Regulation with Online Tools: A Review of MEME and Melina II, Motif Discovery Tools for Active Learning in Biology

    Directory of Open Access Journals (Sweden)

    David Treves

    2012-08-01

    Full Text Available Review of: MEME and Melina II, which are two free and easy-to-use online motif discovery tools that can be employed to actively engage students in learning about gene regulatory elements.

  3. Fragment-based approaches to the discovery of kinase inhibitors.

    Science.gov (United States)

    Mortenson, Paul N; Berdini, Valerio; O'Reilly, Marc

    2014-01-01

    Protein kinases are one of the most important families of drug targets, and aberrant kinase activity has been linked to a large number of disease areas. Although eminently targetable using small molecules, kinases present a number of challenges as drug targets, not least obtaining selectivity across such a large and relatively closely related target family. Fragment-based drug discovery involves screening simple, low-molecular weight compounds to generate initial hits against a target. These hits are then optimized to more potent compounds via medicinal chemistry, usually facilitated by structural biology. Here, we will present a number of recent examples of fragment-based approaches to the discovery of kinase inhibitors, detailing the construction of fragment-screening libraries, the identification and validation of fragment hits, and their optimization into potent and selective lead compounds. The advantages of fragment-based methodologies will be discussed, along with some of the challenges associated with using this route. Finally, we will present a number of key lessons derived both from our own experience running fragment screens against kinases and from a large number of published studies.

  4. Microwave-Assisted Esterification: A Discovery-Based Microscale Laboratory Experiment

    Science.gov (United States)

    Reilly, Maureen K.; King, Ryan P.; Wagner, Alexander J.; King, Susan M.

    2014-01-01

    An undergraduate organic chemistry laboratory experiment has been developed that features a discovery-based microscale Fischer esterification utilizing a microwave reactor. Students individually synthesize a unique ester from known sets of alcohols and carboxylic acids. Each student identifies the best reaction conditions given their particular…

  5. DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.

    Science.gov (United States)

    Yang, Jian-Hua; Qu, Liang-Hu

    2012-01-01

    Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.

  6. The heat is on: thermodynamic analysis in fragment-based drug discovery

    NARCIS (Netherlands)

    Edink, E.S.; Jansen, C.J.W.; Leurs, R.; De Esch, I.J.

    2010-01-01

    Thermodynamic analysis provides access to the determinants of binding affinity, enthalpy and entropy. In fragment-based drug discovery (FBDD), thermodynamic analysis provides a powerful tool to discriminate fragments based on their potential for successful optimization. The thermodynamic data

  7. A Wavelet-Based Approach to Pattern Discovery in Melodies

    DEFF Research Database (Denmark)

    Velarde, Gissel; Meredith, David; Weyde, Tillman

    2016-01-01

    We present a computational method for pattern discovery based on the application of the wavelet transform to symbolic representations of melodies or monophonic voices. We model the importance of a discovered pattern in terms of the compression ratio that can be achieved by using it to describe...

  8. Biomimicry as a basis for drug discovery.

    Science.gov (United States)

    Kolb, V M

    1998-01-01

    Selected works are discussed which clearly demonstrate that mimicking various aspects of the process by which natural products evolved is becoming a powerful tool in contemporary drug discovery. Natural products are an established and rich source of drugs. The term "natural product" is often used synonymously with "secondary metabolite." Knowledge of genetics and molecular evolution helps us understand how biosynthesis of many classes of secondary metabolites evolved. One proposed hypothesis is termed "inventive evolution." It invokes duplication of genes, and mutation of the gene copies, among other genetic events. The modified duplicate genes, per se or in conjunction with other genetic events, may give rise to new enzymes, which, in turn, may generate new products, some of which may be selected for. Steps of the inventive evolution can be mimicked in several ways for purpose of drug discovery. For example, libraries of chemical compounds of any imaginable structure may be produced by combinatorial synthesis. Out of these libraries new active compounds can be selected. In another example, genetic system can be manipulated to produce modified natural products ("unnatural natural products"), from which new drugs can be selected. In some instances, similar natural products turn up in species that are not direct descendants of each other. This is presumably due to a horizontal gene transfer. The mechanism of this inter-species gene transfer can be mimicked in therapeutic gene delivery. Mimicking specifics or principles of chemical evolution including experimental and test-tube evolution also provides leads for new drug discovery.

  9. Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

    Science.gov (United States)

    Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

    2013-10-01

    The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  10. A Framework for Automatic Web Service Discovery Based on Semantics and NLP Techniques

    Directory of Open Access Journals (Sweden)

    Asma Adala

    2011-01-01

    Full Text Available As a greater number of Web Services are made available today, automatic discovery is recognized as an important task. To promote the automation of service discovery, different semantic languages have been created that allow describing the functionality of services in a machine interpretable form using Semantic Web technologies. The problem is that users do not have intimate knowledge about semantic Web service languages and related toolkits. In this paper, we propose a discovery framework that enables semantic Web service discovery based on keywords written in natural language. We describe a novel approach for automatic discovery of semantic Web services which employs Natural Language Processing techniques to match a user request, expressed in natural language, with a semantic Web service description. Additionally, we present an efficient semantic matching technique to compute the semantic distance between ontological concepts.

  11. Fragment-based drug discovery and its application to challenging drug targets.

    Science.gov (United States)

    Price, Amanda J; Howard, Steven; Cons, Benjamin D

    2017-11-08

    Fragment-based drug discovery (FBDD) is a technique for identifying low molecular weight chemical starting points for drug discovery. Since its inception 20 years ago, FBDD has grown in popularity to the point where it is now an established technique in industry and academia. The approach involves the biophysical screening of proteins against collections of low molecular weight compounds (fragments). Although fragments bind to proteins with relatively low affinity, they form efficient, high quality binding interactions with the protein architecture as they have to overcome a significant entropy barrier to bind. Of the biophysical methods available for fragment screening, X-ray protein crystallography is one of the most sensitive and least prone to false positives. It also provides detailed structural information of the protein-fragment complex at the atomic level. Fragment-based screening using X-ray crystallography is therefore an efficient method for identifying binding hotspots on proteins, which can then be exploited by chemists and biologists for the discovery of new drugs. The use of FBDD is illustrated here with a recently published case study of a drug discovery programme targeting the challenging protein-protein interaction Kelch-like ECH-associated protein 1:nuclear factor erythroid 2-related factor 2. © 2017 The Author(s). Published by Portland Press Limited on behalf of the Biochemical Society.

  12. Discovery of the leinamycin family of natural products by mining actinobacterial genomes.

    Science.gov (United States)

    Pan, Guohui; Xu, Zhengren; Guo, Zhikai; Hindra; Ma, Ming; Yang, Dong; Zhou, Hao; Gansemans, Yannick; Zhu, Xiangcheng; Huang, Yong; Zhao, Li-Xing; Jiang, Yi; Cheng, Jinhua; Van Nieuwerburgh, Filip; Suh, Joo-Won; Duan, Yanwen; Shen, Ben

    2017-12-26

    Nature's ability to generate diverse natural products from simple building blocks has inspired combinatorial biosynthesis. The knowledge-based approach to combinatorial biosynthesis has allowed the production of designer analogs by rational metabolic pathway engineering. While successful, structural alterations are limited, with designer analogs often produced in compromised titers. The discovery-based approach to combinatorial biosynthesis complements the knowledge-based approach by exploring the vast combinatorial biosynthesis repertoire found in Nature. Here we showcase the discovery-based approach to combinatorial biosynthesis by targeting the domain of unknown function and cysteine lyase domain (DUF-SH) didomain, specific for sulfur incorporation from the leinamycin (LNM) biosynthetic machinery, to discover the LNM family of natural products. By mining bacterial genomes from public databases and the actinomycetes strain collection at The Scripps Research Institute, we discovered 49 potential producers that could be grouped into 18 distinct clades based on phylogenetic analysis of the DUF-SH didomains. Further analysis of the representative genomes from each of the clades identified 28 lnm -type gene clusters. Structural diversities encoded by the LNM-type biosynthetic machineries were predicted based on bioinformatics and confirmed by in vitro characterization of selected adenylation proteins and isolation and structural elucidation of the guangnanmycins and weishanmycins. These findings demonstrate the power of the discovery-based approach to combinatorial biosynthesis for natural product discovery and structural diversity and highlight Nature's rich biosynthetic repertoire. Comparative analysis of the LNM-type biosynthetic machineries provides outstanding opportunities to dissect Nature's biosynthetic strategies and apply these findings to combinatorial biosynthesis for natural product discovery and structural diversity.

  13. Construction of functional linkage gene networks by data integration.

    Science.gov (United States)

    Linghu, Bolan; Franzosa, Eric A; Xia, Yu

    2013-01-01

    Networks of functional associations between genes have recently been successfully used for gene function and disease-related research. A typical approach for constructing such functional linkage gene networks (FLNs) is based on the integration of diverse high-throughput functional genomics datasets. Data integration is a nontrivial task due to the heterogeneous nature of the different data sources and their variable accuracy and completeness. The presence of correlations between data sources also adds another layer of complexity to the integration process. In this chapter we discuss an approach for constructing a human FLN from data integration and a subsequent application of the FLN to novel disease gene discovery. Similar approaches can be applied to nonhuman species and other discovery tasks.

  14. Using Just-in-Time Information to Support Scientific Discovery Learning in a Computer-Based Simulation

    Science.gov (United States)

    Hulshof, Casper D.; de Jong, Ton

    2006-01-01

    Students encounter many obstacles during scientific discovery learning with computer-based simulations. It is hypothesized that an effective type of support, that does not interfere with the scientific discovery learning process, should be delivered on a "just-in-time" base. This study explores the effect of facilitating access to…

  15. Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation.

    Science.gov (United States)

    Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

    2014-01-01

    Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific

  16. Fragment-based approaches to anti-HIV drug discovery: state of the art and future opportunities.

    Science.gov (United States)

    Huang, Boshi; Kang, Dongwei; Zhan, Peng; Liu, Xinyong

    2015-12-01

    The search for additional drugs to treat HIV infection is a continuing effort due to the emergence and spread of HIV strains resistant to nearly all current drugs. The recent literature reveals that fragment-based drug design/discovery (FBDD) has become an effective alternative to conventional high-throughput screening strategies for drug discovery. In this critical review, the authors describe the state of the art in FBDD strategies for the discovery of anti-HIV drug-like compounds. The article focuses on fragment screening techniques, direct fragment-based design and early hit-to-lead progress. Rapid progress in biophysical detection and in silico techniques has greatly aided the application of FBDD to discover candidate agents directed at a variety of anti-HIV targets. Growing evidence suggests that structural insights on key proteins in the HIV life cycle can be applied in the early phase of drug discovery campaigns, providing valuable information on the binding modes and efficiently prompting fragment hit-to-lead progression. The combination of structural insights with improved methodologies for FBDD, including the privileged fragment-based reconstruction approach, fragment hybridization based on crystallographic overlays, fragment growth exploiting dynamic combinatorial chemistry, and high-speed fragment assembly via diversity-oriented synthesis followed by in situ screening, offers the possibility of more efficient and rapid discovery of novel drugs for HIV-1 prevention or treatment. Though the use of FBDD in anti-HIV drug discovery is still in its infancy, it is anticipated that anti-HIV agents developed via fragment-based strategies will be introduced into the clinic in the future.

  17. On reliable discovery of molecular signatures

    Directory of Open Access Journals (Sweden)

    Björkegren Johan

    2009-01-01

    Full Text Available Abstract Background Molecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype. Reliable signature discovery methods could yield valuable insight into cell biology and mechanisms of human disease. However, it is currently not clear how to control error rates such as the false discovery rate (FDR in signature discovery. Moreover, signatures for cancer gene expression have been shown to be unstable, that is, difficult to replicate in independent studies, casting doubts on their reliability. Results We demonstrate that with modern prediction methods, signatures that yield accurate predictions may still have a high FDR. Further, we show that even signatures with low FDR may fail to replicate in independent studies due to limited statistical power. Thus, neither stability nor predictive accuracy are relevant when FDR control is the primary goal. We therefore develop a general statistical hypothesis testing framework that for the first time provides FDR control for signature discovery. Our method is demonstrated to be correct in simulation studies. When applied to five cancer data sets, the method was able to discover molecular signatures with 5% FDR in three cases, while two data sets yielded no significant findings. Conclusion Our approach enables reliable discovery of molecular signatures from genome-wide data with current sample sizes. The statistical framework developed herein is potentially applicable to a wide range of prediction problems in bioinformatics.

  18. Antibiotic discovery throughout the Small World Initiative: A molecular strategy to identify biosynthetic gene clusters involved in antagonistic activity.

    Science.gov (United States)

    Davis, Elizabeth; Sloan, Tyler; Aurelius, Krista; Barbour, Angela; Bodey, Elijah; Clark, Brigette; Dennis, Celeste; Drown, Rachel; Fleming, Megan; Humbert, Allison; Glasgo, Elizabeth; Kerns, Trent; Lingro, Kelly; McMillin, MacKenzie; Meyer, Aaron; Pope, Breanna; Stalevicz, April; Steffen, Brittney; Steindl, Austin; Williams, Carolyn; Wimberley, Carmen; Zenas, Robert; Butela, Kristen; Wildschutte, Hans

    2017-06-01

    The emergence of bacterial pathogens resistant to all known antibiotics is a global health crisis. Adding to this problem is that major pharmaceutical companies have shifted away from antibiotic discovery due to low profitability. As a result, the pipeline of new antibiotics is essentially dry and many bacteria now resist the effects of most commonly used drugs. To address this global health concern, citizen science through the Small World Initiative (SWI) was formed in 2012. As part of SWI, students isolate bacteria from their local environments, characterize the strains, and assay for antibiotic production. During the 2015 fall semester at Bowling Green State University, students isolated 77 soil-derived bacteria and genetically characterized strains using the 16S rRNA gene, identified strains exhibiting antagonistic activity, and performed an expanded SWI workflow using transposon mutagenesis to identify a biosynthetic gene cluster involved in toxigenic compound production. We identified one mutant with loss of antagonistic activity and through subsequent whole-genome sequencing and linker-mediated PCR identified a 24.9 kb biosynthetic gene locus likely involved in inhibitory activity in that mutant. Further assessment against human pathogens demonstrated the inhibition of Bacillus cereus, Listeria monocytogenes, and methicillin-resistant Staphylococcus aureus in the presence of this compound, thus supporting our molecular strategy as an effective research pipeline for SWI antibiotic discovery and genetic characterization. © 2017 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.

  19. Augmented Reality-Based Simulators as Discovery Learning Tools: An Empirical Study

    Science.gov (United States)

    Ibáñez, María-Blanca; Di-Serio, Ángela; Villarán-Molina, Diego; Delgado-Kloos, Carlos

    2015-01-01

    This paper reports empirical evidence on having students use AR-SaBEr, a simulation tool based on augmented reality (AR), to discover the basic principles of electricity through a series of experiments. AR-SaBEr was enhanced with knowledge-based support and inquiry-based scaffolding mechanisms, which proved useful for discovery learning in…

  20. Effector genomics accelerates discovery and functional profiling of potato disease resistance and phytophthora infestans avirulence genes.

    Directory of Open Access Journals (Sweden)

    Vivianne G A A Vleeshouwers

    Full Text Available Potato is the world's fourth largest food crop yet it continues to endure late blight, a devastating disease caused by the Irish famine pathogen Phytophthora infestans. Breeding broad-spectrum disease resistance (R genes into potato (Solanum tuberosum is the best strategy for genetically managing late blight but current approaches are slow and inefficient. We used a repertoire of effector genes predicted computationally from the P. infestans genome to accelerate the identification, functional characterization, and cloning of potentially broad-spectrum R genes. An initial set of 54 effectors containing a signal peptide and a RXLR motif was profiled for activation of innate immunity (avirulence or Avr activity on wild Solanum species and tentative Avr candidates were identified. The RXLR effector family IpiO induced hypersensitive responses (HR in S. stoloniferum, S. papita and the more distantly related S. bulbocastanum, the source of the R gene Rpi-blb1. Genetic studies with S. stoloniferum showed cosegregation of resistance to P. infestans and response to IpiO. Transient co-expression of IpiO with Rpi-blb1 in a heterologous Nicotiana benthamiana system identified IpiO as Avr-blb1. A candidate gene approach led to the rapid cloning of S. stoloniferum Rpi-sto1 and S. papita Rpi-pta1, which are functionally equivalent to Rpi-blb1. Our findings indicate that effector genomics enables discovery and functional profiling of late blight R genes and Avr genes at an unprecedented rate and promises to accelerate the engineering of late blight resistant potato varieties.

  1. Contributions of computational chemistry and biophysical techniques to fragment-based drug discovery.

    Science.gov (United States)

    Gozalbes, Rafael; Carbajo, Rodrigo J; Pineda-Lucena, Antonio

    2010-01-01

    In the last decade, fragment-based drug discovery (FBDD) has evolved from a novel approach in the search of new hits to a valuable alternative to the high-throughput screening (HTS) campaigns of many pharmaceutical companies. The increasing relevance of FBDD in the drug discovery universe has been concomitant with an implementation of the biophysical techniques used for the detection of weak inhibitors, e.g. NMR, X-ray crystallography or surface plasmon resonance (SPR). At the same time, computational approaches have also been progressively incorporated into the FBDD process and nowadays several computational tools are available. These stretch from the filtering of huge chemical databases in order to build fragment-focused libraries comprising compounds with adequate physicochemical properties, to more evolved models based on different in silico methods such as docking, pharmacophore modelling, QSAR and virtual screening. In this paper we will review the parallel evolution and complementarities of biophysical techniques and computational methods, providing some representative examples of drug discovery success stories by using FBDD.

  2. Discovery of accessible locations using region-based geo-social data

    KAUST Repository

    Wang, Yan; Li, Jianmin; Zhong, Ying; Zhu, Shunzhi; Guo, Danhuai; Shang, Shuo

    2018-01-01

    Geo-social data plays a significant role in location discovery and recommendation. In this light, we propose and study a novel problem of discovering accessible locations in spatial networks using region-based geo-social data. Given a set Q of query

  3. Aptamer-based multiplexed proteomic technology for biomarker discovery.

    Directory of Open Access Journals (Sweden)

    Larry Gold

    2010-12-01

    Full Text Available The interrogation of proteomes ("proteomics" in a highly multiplexed and efficient manner remains a coveted and challenging goal in biology and medicine.We present a new aptamer-based proteomic technology for biomarker discovery capable of simultaneously measuring thousands of proteins from small sample volumes (15 µL of serum or plasma. Our current assay measures 813 proteins with low limits of detection (1 pM median, 7 logs of overall dynamic range (~100 fM-1 µM, and 5% median coefficient of variation. This technology is enabled by a new generation of aptamers that contain chemically modified nucleotides, which greatly expand the physicochemical diversity of the large randomized nucleic acid libraries from which the aptamers are selected. Proteins in complex matrices such as plasma are measured with a process that transforms a signature of protein concentrations into a corresponding signature of DNA aptamer concentrations, which is quantified on a DNA microarray. Our assay takes advantage of the dual nature of aptamers as both folded protein-binding entities with defined shapes and unique nucleotide sequences recognizable by specific hybridization probes. To demonstrate the utility of our proteomics biomarker discovery technology, we applied it to a clinical study of chronic kidney disease (CKD. We identified two well known CKD biomarkers as well as an additional 58 potential CKD biomarkers. These results demonstrate the potential utility of our technology to rapidly discover unique protein signatures characteristic of various disease states.We describe a versatile and powerful tool that allows large-scale comparison of proteome profiles among discrete populations. This unbiased and highly multiplexed search engine will enable the discovery of novel biomarkers in a manner that is unencumbered by our incomplete knowledge of biology, thereby helping to advance the next generation of evidence-based medicine.

  4. Aptamer-based multiplexed proteomic technology for biomarker discovery.

    Science.gov (United States)

    Gold, Larry; Ayers, Deborah; Bertino, Jennifer; Bock, Christopher; Bock, Ashley; Brody, Edward N; Carter, Jeff; Dalby, Andrew B; Eaton, Bruce E; Fitzwater, Tim; Flather, Dylan; Forbes, Ashley; Foreman, Trudi; Fowler, Cate; Gawande, Bharat; Goss, Meredith; Gunn, Magda; Gupta, Shashi; Halladay, Dennis; Heil, Jim; Heilig, Joe; Hicke, Brian; Husar, Gregory; Janjic, Nebojsa; Jarvis, Thale; Jennings, Susan; Katilius, Evaldas; Keeney, Tracy R; Kim, Nancy; Koch, Tad H; Kraemer, Stephan; Kroiss, Luke; Le, Ngan; Levine, Daniel; Lindsey, Wes; Lollo, Bridget; Mayfield, Wes; Mehan, Mike; Mehler, Robert; Nelson, Sally K; Nelson, Michele; Nieuwlandt, Dan; Nikrad, Malti; Ochsner, Urs; Ostroff, Rachel M; Otis, Matt; Parker, Thomas; Pietrasiewicz, Steve; Resnicow, Daniel I; Rohloff, John; Sanders, Glenn; Sattin, Sarah; Schneider, Daniel; Singer, Britta; Stanton, Martin; Sterkel, Alana; Stewart, Alex; Stratford, Suzanne; Vaught, Jonathan D; Vrkljan, Mike; Walker, Jeffrey J; Watrobka, Mike; Waugh, Sheela; Weiss, Allison; Wilcox, Sheri K; Wolfson, Alexey; Wolk, Steven K; Zhang, Chi; Zichi, Dom

    2010-12-07

    The interrogation of proteomes ("proteomics") in a highly multiplexed and efficient manner remains a coveted and challenging goal in biology and medicine. We present a new aptamer-based proteomic technology for biomarker discovery capable of simultaneously measuring thousands of proteins from small sample volumes (15 µL of serum or plasma). Our current assay measures 813 proteins with low limits of detection (1 pM median), 7 logs of overall dynamic range (~100 fM-1 µM), and 5% median coefficient of variation. This technology is enabled by a new generation of aptamers that contain chemically modified nucleotides, which greatly expand the physicochemical diversity of the large randomized nucleic acid libraries from which the aptamers are selected. Proteins in complex matrices such as plasma are measured with a process that transforms a signature of protein concentrations into a corresponding signature of DNA aptamer concentrations, which is quantified on a DNA microarray. Our assay takes advantage of the dual nature of aptamers as both folded protein-binding entities with defined shapes and unique nucleotide sequences recognizable by specific hybridization probes. To demonstrate the utility of our proteomics biomarker discovery technology, we applied it to a clinical study of chronic kidney disease (CKD). We identified two well known CKD biomarkers as well as an additional 58 potential CKD biomarkers. These results demonstrate the potential utility of our technology to rapidly discover unique protein signatures characteristic of various disease states. We describe a versatile and powerful tool that allows large-scale comparison of proteome profiles among discrete populations. This unbiased and highly multiplexed search engine will enable the discovery of novel biomarkers in a manner that is unencumbered by our incomplete knowledge of biology, thereby helping to advance the next generation of evidence-based medicine.

  5. Keefektifan setting TPS dalam pendekatan discovery learning dan problem-based learning pada pembelajaran materi lingkaran SMP

    Directory of Open Access Journals (Sweden)

    Rahmi Hidayati

    2017-05-01

    The purpose of this study was to describe the effectiveness of setting Think Pair Share (TPS in the approach to discovery learning and problem-based learning in terms of student achievement, mathematical communication skills, and interpersonal skills of the student.  This study was a quasi-experimental study using the pretest-posttest nonequivalent group design. The research population comprised all Year VIII students of SMP Negeri 1 Yogyakarta. The research sample was randomly selected from eight classes, two classes were elected. The instrument used in this study is the learning achievement test, a test of mathematical communication skills, and interpersonal skills student questionnaires. To test the effectiveness of setting Think Pair Share (TPS in the approach to discovery learning and problem-based learning, the one sample t-test was carried out. Then, to investigate the difference in effectiveness between the setting Think Pair Share (TPS in the approach to discovery learning and problem-based learning, the Multivariate Analysis of Variance (MANOVA was carried out. The research findings indicate that the setting TPS discovery approach to learning and problem-based approach to learning (PBL is effective in terms of learning achievement, mathematical communication skills, and interpersonal skills of the students. No difference in effectiveness between setting TPS discovery approach to learning and problem-based learning (PBL in terms of learning achievement, mathematical communication skills, and interpersonal skills of the students. Keywords: TPS setting in discovery learning approach, in problem-based learning, academic achievement, mathematical communication skills, and interpersonal skills of the student

  6. Discovery of possible gene relationships through the application of self-organizing maps to DNA microarray databases.

    Science.gov (United States)

    Chavez-Alvarez, Rocio; Chavoya, Arturo; Mendez-Vazquez, Andres

    2014-01-01

    DNA microarrays and cell cycle synchronization experiments have made possible the study of the mechanisms of cell cycle regulation of Saccharomyces cerevisiae by simultaneously monitoring the expression levels of thousands of genes at specific time points. On the other hand, pattern recognition techniques can contribute to the analysis of such massive measurements, providing a model of gene expression level evolution through the cell cycle process. In this paper, we propose the use of one of such techniques--an unsupervised artificial neural network called a Self-Organizing Map (SOM)-which has been successfully applied to processes involving very noisy signals, classifying and organizing them, and assisting in the discovery of behavior patterns without requiring prior knowledge about the process under analysis. As a test bed for the use of SOMs in finding possible relationships among genes and their possible contribution in some biological processes, we selected 282 S. cerevisiae genes that have been shown through biological experiments to have an activity during the cell cycle. The expression level of these genes was analyzed in five of the most cited time series DNA microarray databases used in the study of the cell cycle of this organism. With the use of SOM, it was possible to find clusters of genes with similar behavior in the five databases along two cell cycles. This result suggested that some of these genes might be biologically related or might have a regulatory relationship, as was corroborated by comparing some of the clusters obtained with SOMs against a previously reported regulatory network that was generated using biological knowledge, such as protein-protein interactions, gene expression levels, metabolism dynamics, promoter binding, and modification, regulation and transport of proteins. The methodology described in this paper could be applied to the study of gene relationships of other biological processes in different organisms.

  7. Functional gene-based discovery of phenazines from the actinobacteria associated with marine sponges in the South China Sea.

    Science.gov (United States)

    Karuppiah, Valliappan; Li, Yingxin; Sun, Wei; Feng, Guofang; Li, Zhiyong

    2015-07-01

    Phenazines represent a large group of nitrogen-containing heterocyclic compounds produced by the diverse group of bacteria including actinobacteria. In this study, a total of 197 actinobacterial strains were isolated from seven different marine sponge species in the South China Sea using five different culture media. Eighty-seven morphologically different actinobacterial strains were selected and grouped into 13 genera, including Actinoalloteichus, Kocuria, Micrococcus, Micromonospora, Mycobacterium, Nocardiopsis, Prauserella, Rhodococcus, Saccharopolyspora, Salinispora, Serinicoccus, and Streptomyces by the phylogenetic analysis of 16S rRNA gene. Based on the screening of phzE genes, ten strains, including five Streptomyces, two Nocardiopsis, one Salinispora, one Micrococcus, and one Serinicoccus were found to be potential for phenazine production. The level of phzE gene expression was highly expressed in Nocardiopsis sp. 13-33-15, 13-12-13, and Serinicoccus sp. 13-12-4 on the fifth day of fermentation. Finally, 1,6-dihydroxy phenazine (1) from Nocardiopsis sp. 13-33-15 and 13-12-13, and 1,6-dimethoxy phenazine (2) from Nocardiopsis sp. 13-33-15 were isolated and identified successfully based on ESI-MS and NMR analysis. The compounds 1 and 2 showed antibacterial activity against Bacillus mycoides SJ14, Staphylococcus aureus SJ51, Escherichia coli SJ42, and Micrococcus luteus SJ47. This study suggests that the integrated approach of gene screening and chemical analysis is an effective strategy to find the target compounds and lays the basis for the production of phenazine from the sponge-associated actinobacteria.

  8. Model-driven discovery of underground metabolic functions in Escherichia coli

    DEFF Research Database (Denmark)

    Guzmán, Gabriela I.; Utrilla, José; Nurk, Sergey

    2015-01-01

    -scale models, which have been widely used for predicting growth phenotypes in various environments or following a genetic perturbation; however, these predictions occasionally fail. Failed predictions of gene essentiality offer an opportunity for targeting biological discovery, suggesting the presence......E, and gltA and prpC. This study demonstrates how a targeted model-driven approach to discovery can systematically fill knowledge gaps, characterize underground metabolism, and elucidate regulatory mechanisms of adaptation in response to gene KO perturbations....

  9. Fragment-based drug discovery using rational design.

    Science.gov (United States)

    Jhoti, H

    2007-01-01

    Fragment-based drug discovery (FBDD) is established as an alternative approach to high-throughput screening for generating novel small molecule drug candidates. In FBDD, relatively small libraries of low molecular weight compounds (or fragments) are screened using sensitive biophysical techniques to detect their binding to the target protein. A lower absolute affinity of binding is expected from fragments, compared to much higher molecular weight hits detected by high-throughput screening, due to their reduced size and complexity. Through the use of iterative cycles of medicinal chemistry, ideally guided by three-dimensional structural data, it is often then relatively straightforward to optimize these weak binding fragment hits into potent and selective lead compounds. As with most other lead discovery methods there are two key components of FBDD; the detection technology and the compound library. In this review I outline the two main approaches used for detecting the binding of low affinity fragments and also some of the key principles that are used to generate a fragment library. In addition, I describe an example of how FBDD has led to the generation of a drug candidate that is now being tested in clinical trials for the treatment of cancer.

  10. Ataxin1L is a regulator of HSC function highlighting the utility of cross-tissue comparisons for gene discovery.

    Directory of Open Access Journals (Sweden)

    Juliette J Kahle

    2013-03-01

    Full Text Available Hematopoietic stem cells (HSCs are rare quiescent cells that continuously replenish the cellular components of the peripheral blood. Observing that the ataxia-associated gene Ataxin-1-like (Atxn1L was highly expressed in HSCs, we examined its role in HSC function through in vitro and in vivo assays. Mice lacking Atxn1L had greater numbers of HSCs that regenerated the blood more quickly than their wild-type counterparts. Molecular analyses indicated Atxn1L null HSCs had gene expression changes that regulate a program consistent with their higher level of proliferation, suggesting that Atxn1L is a novel regulator of HSC quiescence. To determine if additional brain-associated genes were candidates for hematologic regulation, we examined genes encoding proteins from autism- and ataxia-associated protein-protein interaction networks for their representation in hematopoietic cell populations. The interactomes were found to be highly enriched for proteins encoded by genes specifically expressed in HSCs relative to their differentiated progeny. Our data suggest a heretofore unappreciated similarity between regulatory modules in the brain and HSCs, offering a new strategy for novel gene discovery in both systems.

  11. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands

    Science.gov (United States)

    Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Rajakumar, Kumar

    2007-01-01

    MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands. PMID:17537813

  12. Biomedical Information Extraction: Mining Disease Associated Genes from Literature

    Science.gov (United States)

    Huang, Zhong

    2014-01-01

    Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to…

  13. Experiences in fragment-based drug discovery.

    Science.gov (United States)

    Murray, Christopher W; Verdonk, Marcel L; Rees, David C

    2012-05-01

    Fragment-based drug discovery (FBDD) has become established in both industry and academia as an alternative approach to high-throughput screening for the generation of chemical leads for drug targets. In FBDD, specialised detection methods are used to identify small chemical compounds (fragments) that bind to the drug target, and structural biology is usually employed to establish their binding mode and to facilitate their optimisation. In this article, we present three recent and successful case histories in FBDD. We then re-examine the key concepts and challenges of FBDD with particular emphasis on recent literature and our own experience from a substantial number of FBDD applications. Our opinion is that careful application of FBDD is living up to its promise of delivering high quality leads with good physical properties and that in future many drug molecules will be derived from fragment-based approaches. Copyright © 2012 Elsevier Ltd. All rights reserved.

  14. Systematic discovery of unannotated genes in 11 yeast species using a database of orthologous genomic segments

    LENUS (Irish Health Repository)

    OhEigeartaigh, Sean S

    2011-07-26

    Abstract Background In standard BLAST searches, no information other than the sequences of the query and the database entries is considered. However, in situations where two genes from different species have only borderline similarity in a BLAST search, the discovery that the genes are located within a region of conserved gene order (synteny) can provide additional evidence that they are orthologs. Thus, for interpreting borderline search results, it would be useful to know whether the syntenic context of a database hit is similar to that of the query. This principle has often been used in investigations of particular genes or genomic regions, but to our knowledge it has never been implemented systematically. Results We made use of the synteny information contained in the Yeast Gene Order Browser database for 11 yeast species to carry out a systematic search for protein-coding genes that were overlooked in the original annotations of one or more yeast genomes but which are syntenic with their orthologs. Such genes tend to have been overlooked because they are short, highly divergent, or contain introns. The key features of our software - called SearchDOGS - are that the database entries are classified into sets of genomic segments that are already known to be orthologous, and that very weak BLAST hits are retained for further analysis if their genomic location is similar to that of the query. Using SearchDOGS we identified 595 additional protein-coding genes among the 11 yeast species, including two new genes in Saccharomyces cerevisiae. We found additional genes for the mating pheromone a-factor in six species including Kluyveromyces lactis. Conclusions SearchDOGS has proven highly successful for identifying overlooked genes in the yeast genomes. We anticipate that our approach can be adapted for study of further groups of species, such as bacterial genomes. More generally, the concept of doing sequence similarity searches against databases to which external

  15. Fragment-based drug discovery as alternative strategy to the drug development for neglected diseases.

    Science.gov (United States)

    Mello, Juliana da Fonseca Rezende E; Gomes, Renan Augusto; Vital-Fujii, Drielli Gomes; Ferreira, Glaucio Monteiro; Trossini, Gustavo Henrique Goulart

    2017-12-01

    Neglected diseases (NDs) affect large populations and almost whole continents, representing 12% of the global health burden. In contrast, the treatment available today is limited and sometimes ineffective. Under this scenery, the Fragment-Based Drug Discovery emerged as one of the most promising alternatives to the traditional methods of drug development. This method allows achieving new lead compounds with smaller size of fragment libraries. Even with the wide Fragment-Based Drug Discovery success resulting in new effective therapeutic agents against different diseases, until this moment few studies have been applied this approach for NDs area. In this article, we discuss the basic Fragment-Based Drug Discovery process, brief successful ideas of general applications and show a landscape of its use in NDs, encouraging the implementation of this strategy as an interesting way to optimize the development of new drugs to NDs. © 2017 John Wiley & Sons A/S.

  16. Using FlyBase, a Database of Drosophila Genes and Genomes.

    Science.gov (United States)

    Marygold, Steven J; Crosby, Madeline A; Goodman, Joshua L

    2016-01-01

    For nearly 25 years, FlyBase (flybase.org) has provided a freely available online database of biological information about Drosophila species, focusing on the model organism D. melanogaster. The need for a centralized, integrated view of Drosophila research has never been greater as advances in genomic, proteomic, and high-throughput technologies add to the quantity and diversity of available data and resources.FlyBase has taken several approaches to respond to these changes in the research landscape. Novel report pages have been generated for new reagent types and physical interaction data; Drosophila models of human disease are now represented and showcased in dedicated Human Disease Model Reports; other integrated reports have been established that bring together related genes, datasets, or reagents; Gene Reports have been revised to improve access to new data types and to highlight functional data; links to external sites have been organized and expanded; and new tools have been developed to display and interrogate all these data, including improved batch processing and bulk file availability. In addition, several new community initiatives have served to enhance interactions between researchers and FlyBase, resulting in direct user contributions and improved feedback.This chapter provides an overview of the data content, organization, and available tools within FlyBase, focusing on recent improvements. We hope it serves as a guide for our diverse user base, enabling efficient and effective exploration of the database and thereby accelerating research discoveries.

  17. ACFIS: a web server for fragment-based drug discovery

    Science.gov (United States)

    Hao, Ge-Fei; Jiang, Wen; Ye, Yuan-Nong; Wu, Feng-Xu; Zhu, Xiao-Lei; Guo, Feng-Biao; Yang, Guang-Fu

    2016-01-01

    In order to foster innovation and improve the effectiveness of drug discovery, there is a considerable interest in exploring unknown ‘chemical space’ to identify new bioactive compounds with novel and diverse scaffolds. Hence, fragment-based drug discovery (FBDD) was developed rapidly due to its advanced expansive search for ‘chemical space’, which can lead to a higher hit rate and ligand efficiency (LE). However, computational screening of fragments is always hampered by the promiscuous binding model. In this study, we developed a new web server Auto Core Fragment in silico Screening (ACFIS). It includes three computational modules, PARA_GEN, CORE_GEN and CAND_GEN. ACFIS can generate core fragment structure from the active molecule using fragment deconstruction analysis and perform in silico screening by growing fragments to the junction of core fragment structure. An integrated energy calculation rapidly identifies which fragments fit the binding site of a protein. We constructed a simple interface to enable users to view top-ranking molecules in 2D and the binding mode in 3D for further experimental exploration. This makes the ACFIS a highly valuable tool for drug discovery. The ACFIS web server is free and open to all users at http://chemyang.ccnu.edu.cn/ccb/server/ACFIS/. PMID:27150808

  18. De-novo discovery of differentially abundant transcription factor binding sites including their positional preference.

    Science.gov (United States)

    Keilwagen, Jens; Grau, Jan; Paponov, Ivan A; Posch, Stefan; Strickert, Marc; Grosse, Ivo

    2011-02-10

    Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open

  19. Frequency-based time-series gene expression recomposition using PRIISM

    Directory of Open Access Journals (Sweden)

    Rosa Bruce A

    2012-06-01

    Full Text Available Abstract Background Circadian rhythm pathways influence the expression patterns of as much as 31% of the Arabidopsis genome through complicated interaction pathways, and have been found to be significantly disrupted by biotic and abiotic stress treatments, complicating treatment-response gene discovery methods due to clock pattern mismatches in the fold change-based statistics. The PRIISM (Pattern Recomposition for the Isolation of Independent Signals in Microarray data algorithm outlined in this paper is designed to separate pattern changes induced by different forces, including treatment-response pathways and circadian clock rhythm disruptions. Results Using the Fourier transform, high-resolution time-series microarray data is projected to the frequency domain. By identifying the clock frequency range from the core circadian clock genes, we separate the frequency spectrum to different sections containing treatment-frequency (representing up- or down-regulation by an adaptive treatment response, clock-frequency (representing the circadian clock-disruption response and noise-frequency components. Then, we project the components’ spectra back to the expression domain to reconstruct isolated, independent gene expression patterns representing the effects of the different influences. By applying PRIISM on a high-resolution time-series Arabidopsis microarray dataset under a cold treatment, we systematically evaluated our method using maximum fold change and principal component analyses. The results of this study showed that the ranked treatment-frequency fold change results produce fewer false positives than the original methodology, and the 26-hour timepoint in our dataset was the best statistic for distinguishing the most known cold-response genes. In addition, six novel cold-response genes were discovered. PRIISM also provides gene expression data which represents only circadian clock influences, and may be useful for circadian clock studies

  20. Characterization of Capsicum annuum genetic diversity and population structure based on parallel polymorphism discovery with a 30K unigene Pepper GeneChip.

    Science.gov (United States)

    Hill, Theresa A; Ashrafi, Hamid; Reyes-Chin-Wo, Sebastian; Yao, JiQiang; Stoffel, Kevin; Truco, Maria-Jose; Kozik, Alexander; Michelmore, Richard W; Van Deynze, Allen

    2013-01-01

    The widely cultivated pepper, Capsicum spp., important as a vegetable and spice crop world-wide, is one of the most diverse crops. To enhance breeding programs, a detailed characterization of Capsicum diversity including morphological, geographical and molecular data is required. Currently, molecular data characterizing Capsicum genetic diversity is limited. The development and application of high-throughput genome-wide markers in Capsicum will facilitate more detailed molecular characterization of germplasm collections, genetic relationships, and the generation of ultra-high density maps. We have developed the Pepper GeneChip® array from Affymetrix for polymorphism detection and expression analysis in Capsicum. Probes on the array were designed from 30,815 unigenes assembled from expressed sequence tags (ESTs). Our array design provides a maximum redundancy of 13 probes per base pair position allowing integration of multiple hybridization values per position to detect single position polymorphism (SPP). Hybridization of genomic DNA from 40 diverse C. annuum lines, used in breeding and research programs, and a representative from three additional cultivated species (C. frutescens, C. chinense and C. pubescens) detected 33,401 SPP markers within 13,323 unigenes. Among the C. annuum lines, 6,426 SPPs covering 3,818 unigenes were identified. An estimated three-fold reduction in diversity was detected in non-pungent compared with pungent lines, however, we were able to detect 251 highly informative markers across these C. annuum lines. In addition, an 8.7 cM region without polymorphism was detected around Pun1 in non-pungent C. annuum. An analysis of genetic relatedness and diversity using the software Structure revealed clustering of the germplasm which was confirmed with statistical support by principle components analysis (PCA) and phylogenetic analysis. This research demonstrates the effectiveness of parallel high-throughput discovery and application of genome

  1. Characterization of Capsicum annuum genetic diversity and population structure based on parallel polymorphism discovery with a 30K unigene Pepper GeneChip.

    Directory of Open Access Journals (Sweden)

    Theresa A Hill

    Full Text Available The widely cultivated pepper, Capsicum spp., important as a vegetable and spice crop world-wide, is one of the most diverse crops. To enhance breeding programs, a detailed characterization of Capsicum diversity including morphological, geographical and molecular data is required. Currently, molecular data characterizing Capsicum genetic diversity is limited. The development and application of high-throughput genome-wide markers in Capsicum will facilitate more detailed molecular characterization of germplasm collections, genetic relationships, and the generation of ultra-high density maps. We have developed the Pepper GeneChip® array from Affymetrix for polymorphism detection and expression analysis in Capsicum. Probes on the array were designed from 30,815 unigenes assembled from expressed sequence tags (ESTs. Our array design provides a maximum redundancy of 13 probes per base pair position allowing integration of multiple hybridization values per position to detect single position polymorphism (SPP. Hybridization of genomic DNA from 40 diverse C. annuum lines, used in breeding and research programs, and a representative from three additional cultivated species (C. frutescens, C. chinense and C. pubescens detected 33,401 SPP markers within 13,323 unigenes. Among the C. annuum lines, 6,426 SPPs covering 3,818 unigenes were identified. An estimated three-fold reduction in diversity was detected in non-pungent compared with pungent lines, however, we were able to detect 251 highly informative markers across these C. annuum lines. In addition, an 8.7 cM region without polymorphism was detected around Pun1 in non-pungent C. annuum. An analysis of genetic relatedness and diversity using the software Structure revealed clustering of the germplasm which was confirmed with statistical support by principle components analysis (PCA and phylogenetic analysis. This research demonstrates the effectiveness of parallel high-throughput discovery and

  2. Species-independent MicroRNA Gene Discovery

    KAUST Repository

    Kamanu, Timothy K.

    2012-01-01

    and other incurable diseases such as autism and Alzheimer’s. Functional miRNAs are excised from hairpin-like sequences that are known as miRNA genes. There are about 21,000 known miRNA genes, most of which have been determined using experimental methods. mi

  3. In silico discovery of transcription regulatory elements in Plasmodium falciparum

    Directory of Open Access Journals (Sweden)

    Le Roch Karine G

    2008-02-01

    Full Text Available Abstract Background With the sequence of the Plasmodium falciparum genome and several global mRNA and protein life cycle expression profiling projects now completed, elucidating the underlying networks of transcriptional control important for the progression of the parasite life cycle is highly pertinent to the development of new anti-malarials. To date, relatively little is known regarding the specific mechanisms the parasite employs to regulate gene expression at the mRNA level, with studies of the P. falciparum genome sequence having revealed few cis-regulatory elements and associated transcription factors. Although it is possible the parasite may evoke mechanisms of transcriptional control drastically different from those used by other eukaryotic organisms, the extreme AT-rich nature of P. falciparum intergenic regions (~90% AT presents significant challenges to in silico cis-regulatory element discovery. Results We have developed an algorithm called Gene Enrichment Motif Searching (GEMS that uses a hypergeometric-based scoring function and a position-weight matrix optimization routine to identify with high-confidence regulatory elements in the nucleotide-biased and repeat sequence-rich P. falciparum genome. When applied to promoter regions of genes contained within 21 co-expression gene clusters generated from P. falciparum life cycle microarray data using the semi-supervised clustering algorithm Ontology-based Pattern Identification, GEMS identified 34 putative cis-regulatory elements associated with a variety of parasite processes including sexual development, cell invasion, antigenic variation and protein biosynthesis. Among these candidates were novel motifs, as well as many of the elements for which biological experimental evidence already exists in the Plasmodium literature. To provide evidence for the biological relevance of a cell invasion-related element predicted by GEMS, reporter gene and electrophoretic mobility shift assays

  4. Gene/QTL discovery for Anthracnose in common bean (Phaseolus vulgaris L.) from North-western Himalayas.

    Science.gov (United States)

    Choudhary, Neeraj; Bawa, Vanya; Paliwal, Rajneesh; Singh, Bikram; Bhat, Mohd Ashraf; Mir, Javid Iqbal; Gupta, Moni; Sofi, Parvaze A; Thudi, Mahendar; Varshney, Rajeev K; Mir, Reyazul Rouf

    2018-01-01

    Common bean (Phaseolus vulgaris L.) is one of the most important grain legume crops in the world. The beans grown in north-western Himalayas possess huge diversity for seed color, shape and size but are mostly susceptible to Anthracnose disease caused by seed born fungus Colletotrichum lindemuthianum. Dozens of QTLs/genes have been already identified for this disease in common bean world-wide. However, this is the first report of gene/QTL discovery for Anthracnose using bean germplasm from north-western Himalayas of state Jammu & Kashmir, India. A core set of 96 bean lines comprising 54 indigenous local landraces from 11 hot-spots and 42 exotic lines from 10 different countries were phenotyped at two locations (SKUAST-Jammu and Bhaderwah, Jammu) for Anthracnose resistance. The core set was also genotyped with genome-wide (91) random and trait linked SSR markers. The study of marker-trait associations (MTAs) led to the identification of 10 QTLs/genes for Anthracnose resistance. Among the 10 QTLs/genes identified, two MTAs are stable (BM45 & BM211), two MTAs (PVctt1 & BM211) are major explaining more than 20% phenotypic variation for Anthracnose and one MTA (BM211) is both stable and major. Six (06) genomic regions are reported for the first time, while as four (04) genomic regions validated the already known QTL/gene regions/clusters for Anthracnose. The major, stable and validated markers reported during the present study associated with Anthracnose resistance will prove useful in common bean molecular breeding programs aimed at enhancing Anthracnose resistance of local bean landraces grown in north-western Himalayas of state Jammu and Kashmir.

  5. Lessons from hot spot analysis for fragment-based drug discovery

    Science.gov (United States)

    Hall, David R.; Vajda, Sandor

    2015-01-01

    Analysis of binding energy hot spots at protein surfaces can provide crucial insights into the prospects for successful application of fragment-based drug discovery (FBDD), and whether a fragment hit can be advanced into a high affinity, druglike ligand. The key factor is the strength of the top ranking hot spot, and how well a given fragment complements it. We show that published data are sufficient to provide a sophisticated and quantitative understanding of how hot spots derive from protein three-dimensional structure, and how their strength, number and spatial arrangement govern the potential for a surface site to bind to fragment-sized and larger ligands. This improved understanding provides important guidance for the effective application of FBDD in drug discovery. PMID:26538314

  6. Discovery of Boolean metabolic networks: integer linear programming based approach.

    Science.gov (United States)

    Qiu, Yushan; Jiang, Hao; Ching, Wai-Ki; Cheng, Xiaoqing

    2018-04-11

    Traditional drug discovery methods focused on the efficacy of drugs rather than their toxicity. However, toxicity and/or lack of efficacy are produced when unintended targets are affected in metabolic networks. Thus, identification of biological targets which can be manipulated to produce the desired effect with minimum side-effects has become an important and challenging topic. Efficient computational methods are required to identify the drug targets while incurring minimal side-effects. In this paper, we propose a graph-based computational damage model that summarizes the impact of enzymes on compounds in metabolic networks. An efficient method based on Integer Linear Programming formalism is then developed to identify the optimal enzyme-combination so as to minimize the side-effects. The identified target enzymes for known successful drugs are then verified by comparing the results with those in the existing literature. Side-effects reduction plays a crucial role in the study of drug development. A graph-based computational damage model is proposed and the theoretical analysis states the captured problem is NP-completeness. The proposed approaches can therefore contribute to the discovery of drug targets. Our developed software is available at " http://hkumath.hku.hk/~wkc/APBC2018-metabolic-network.zip ".

  7. Emerging techniques for the discovery and validation of therapeutic targets for skeletal diseases.

    Science.gov (United States)

    Cho, Christine H; Nuttall, Mark E

    2002-12-01

    Advances in genomics and proteomics have revolutionised the drug discovery process and target validation. Identification of novel therapeutic targets for chronic skeletal diseases is an extremely challenging process based on the difficulty of obtaining high-quality human diseased versus normal tissue samples. The quality of tissue and genomic information obtained from the sample is critical to identifying disease-related genes. Using a genomics-based approach, novel genes or genes with similar homology to existing genes can be identified from cDNA libraries generated from normal versus diseased tissue. High-quality cDNA libraries are prepared from uncontaminated homogeneous cell populations harvested from tissue sections of interest. Localised gene expression analysis and confirmation are obtained through in situ hybridisation or immunohistochemical studies. Cells overexpressing the recombinant protein are subsequently designed for primary cell-based high-throughput assays that are capable of screening large compound banks for potential hits. Afterwards, secondary functional assays are used to test promising compounds. The same overexpressing cells are used in the secondary assay to test protein activity and functionality as well as screen for small-molecule agonists or antagonists. Once a hit is generated, a structure-activity relationship of the compound is optimised for better oral bioavailability and pharmacokinetics allowing the compound to progress into development. Parallel efforts from proteomics, as well as genetics/transgenics, bioinformatics and combinatorial chemistry, and improvements in high-throughput automation technologies, allow the drug discovery process to meet the demands of the medicinal market. This review discusses and illustrates how different approaches are incorporated into the discovery and validation of novel targets and, consequently, the development of potentially therapeutic agents in the areas of osteoporosis and osteoarthritis

  8. The limits of de novo DNA motif discovery.

    Directory of Open Access Journals (Sweden)

    David Simcha

    Full Text Available A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify "motifs" that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery-searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA "background" sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are "too null," resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where "ground truth" is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced "over-fitting" in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of

  9. ACFIS: a web server for fragment-based drug discovery.

    Science.gov (United States)

    Hao, Ge-Fei; Jiang, Wen; Ye, Yuan-Nong; Wu, Feng-Xu; Zhu, Xiao-Lei; Guo, Feng-Biao; Yang, Guang-Fu

    2016-07-08

    In order to foster innovation and improve the effectiveness of drug discovery, there is a considerable interest in exploring unknown 'chemical space' to identify new bioactive compounds with novel and diverse scaffolds. Hence, fragment-based drug discovery (FBDD) was developed rapidly due to its advanced expansive search for 'chemical space', which can lead to a higher hit rate and ligand efficiency (LE). However, computational screening of fragments is always hampered by the promiscuous binding model. In this study, we developed a new web server Auto Core Fragment in silico Screening (ACFIS). It includes three computational modules, PARA_GEN, CORE_GEN and CAND_GEN. ACFIS can generate core fragment structure from the active molecule using fragment deconstruction analysis and perform in silico screening by growing fragments to the junction of core fragment structure. An integrated energy calculation rapidly identifies which fragments fit the binding site of a protein. We constructed a simple interface to enable users to view top-ranking molecules in 2D and the binding mode in 3D for further experimental exploration. This makes the ACFIS a highly valuable tool for drug discovery. The ACFIS web server is free and open to all users at http://chemyang.ccnu.edu.cn/ccb/server/ACFIS/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Sports Stars: Analyzing the Performance of Astronomers at Visualization-based Discovery

    Science.gov (United States)

    Fluke, C. J.; Parrington, L.; Hegarty, S.; MacMahon, C.; Morgan, S.; Hassan, A. H.; Kilborn, V. A.

    2017-05-01

    In this data-rich era of astronomy, there is a growing reliance on automated techniques to discover new knowledge. The role of the astronomer may change from being a discoverer to being a confirmer. But what do astronomers actually look at when they distinguish between “sources” and “noise?” What are the differences between novice and expert astronomers when it comes to visual-based discovery? Can we identify elite talent or coach astronomers to maximize their potential for discovery? By looking to the field of sports performance analysis, we consider an established, domain-wide approach, where the expertise of the viewer (i.e., a member of the coaching team) plays a crucial role in identifying and determining the subtle features of gameplay that provide a winning advantage. As an initial case study, we investigate whether the SportsCode performance analysis software can be used to understand and document how an experienced Hi astronomer makes discoveries in spectral data cubes. We find that the process of timeline-based coding can be applied to spectral cube data by mapping spectral channels to frames within a movie. SportsCode provides a range of easy to use methods for annotation, including feature-based codes and labels, text annotations associated with codes, and image-based drawing. The outputs, including instance movies that are uniquely associated with coded events, provide the basis for a training program or team-based analysis that could be used in unison with discipline specific analysis software. In this coordinated approach to visualization and analysis, SportsCode can act as a visual notebook, recording the insight and decisions in partnership with established analysis methods. Alternatively, in situ annotation and coding of features would be a valuable addition to existing and future visualization and analysis packages.

  11. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands

    OpenAIRE

    Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Deng, Zixin; Rajakumar, Kumar

    2007-01-01

    MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’....

  12. Network-Guided Key Gene Discovery for a Given Cellular Process

    DEFF Research Database (Denmark)

    He, Feng Q; Ollert, Markus

    2018-01-01

    Identification of key genes for a given physiological or pathological process is an essential but still very challenging task for the entire biomedical research community. Statistics-based approaches, such as genome-wide association study (GWAS)- or quantitative trait locus (QTL)-related analysis...... have already made enormous contributions to identifying key genes associated with a given disease or phenotype, the success of which is however very much dependent on a huge number of samples. Recent advances in network biology, especially network inference directly from genome-scale data...

  13. A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model

    Directory of Open Access Journals (Sweden)

    Mickael Orgeur

    2018-01-01

    Full Text Available The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads, and the gene annotation that defines gene features, must also be taken into account. A partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and a de novo transcriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.

  14. The effect of discovery learning and problem-based learning on middle school students’ self-regulated learning

    Science.gov (United States)

    Miatun, A.; Muntazhimah

    2018-01-01

    The aim of this research was to determine the effect of learning models on mathematics achievement viewed from student’s self-regulated learning. The learning model compared were discovery learning and problem-based learning. The population was all students at the grade VIII of Junior High School in Boyolali regency. The samples were students of SMPN 4 Boyolali, SMPN 6 Boyolali, and SMPN 4 Mojosongo. The instruments used were mathematics achievement tests and self-regulated learning questionnaire. The data were analyzed using unbalanced two-ways Anova. The conclusion was as follows: (1) discovery learning gives better achievement than problem-based learning. (2) Achievement of students who have high self-regulated learning was better than students who have medium and low self-regulated learning. (3) For discovery learning, achievement of students who have high self-regulated learning was better than students who have medium and low self-regulated learning. For problem-based learning, students who have high and medium self-regulated learning have the same achievement. (4) For students who have high self-regulated learning, discovery learning gives better achievement than problem-based learning. Students who have medium and low self-regulated learning, both learning models give the same achievement.

  15. Lessons from Hot Spot Analysis for Fragment-Based Drug Discovery.

    Science.gov (United States)

    Hall, David R; Kozakov, Dima; Whitty, Adrian; Vajda, Sandor

    2015-11-01

    Analysis of binding energy hot spots at protein surfaces can provide crucial insights into the prospects for successful application of fragment-based drug discovery (FBDD), and whether a fragment hit can be advanced into a high-affinity, drug-like ligand. The key factor is the strength of the top ranking hot spot, and how well a given fragment complements it. We show that published data are sufficient to provide a sophisticated and quantitative understanding of how hot spots derive from a protein 3D structure, and how their strength, number, and spatial arrangement govern the potential for a surface site to bind to fragment-sized and larger ligands. This improved understanding provides important guidance for the effective application of FBDD in drug discovery. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Microarray-based ultra-high resolution discovery of genomic deletion mutations

    Science.gov (United States)

    2014-01-01

    Background Oligonucleotide microarray-based comparative genomic hybridization (CGH) offers an attractive possible route for the rapid and cost-effective genome-wide discovery of deletion mutations. CGH typically involves comparison of the hybridization intensities of genomic DNA samples with microarray chip representations of entire genomes, and has widespread potential application in experimental research and medical diagnostics. However, the power to detect small deletions is low. Results Here we use a graduated series of Arabidopsis thaliana genomic deletion mutations (of sizes ranging from 4 bp to ~5 kb) to optimize CGH-based genomic deletion detection. We show that the power to detect smaller deletions (4, 28 and 104 bp) depends upon oligonucleotide density (essentially the number of genome-representative oligonucleotides on the microarray chip), and determine the oligonucleotide spacings necessary to guarantee detection of deletions of specified size. Conclusions Our findings will enhance a wide range of research and clinical applications, and in particular will aid in the discovery of genomic deletions in the absence of a priori knowledge of their existence. PMID:24655320

  17. Two combinatorial optimization problems for SNP discovery using base-specific cleavage and mass spectrometry.

    Science.gov (United States)

    Chen, Xin; Wu, Qiong; Sun, Ruimin; Zhang, Louxin

    2012-01-01

    The discovery of single-nucleotide polymorphisms (SNPs) has important implications in a variety of genetic studies on human diseases and biological functions. One valuable approach proposed for SNP discovery is based on base-specific cleavage and mass spectrometry. However, it is still very challenging to achieve the full potential of this SNP discovery approach. In this study, we formulate two new combinatorial optimization problems. While both problems are aimed at reconstructing the sample sequence that would attain the minimum number of SNPs, they search over different candidate sequence spaces. The first problem, denoted as SNP - MSP, limits its search to sequences whose in silico predicted mass spectra have all their signals contained in the measured mass spectra. In contrast, the second problem, denoted as SNP - MSQ, limits its search to sequences whose in silico predicted mass spectra instead contain all the signals of the measured mass spectra. We present an exact dynamic programming algorithm for solving the SNP - MSP problem and also show that the SNP - MSQ problem is NP-hard by a reduction from a restricted variation of the 3-partition problem. We believe that an efficient solution to either problem above could offer a seamless integration of information in four complementary base-specific cleavage reactions, thereby improving the capability of the underlying biotechnology for sensitive and accurate SNP discovery.

  18. Using Phenomic Analysis of Photosynthetic Function for Abiotic Stress Response Gene Discovery

    KAUST Repository

    Rungrat, Tepsuda

    2016-09-09

    Monitoring the photosynthetic performance of plants is a major key to understanding how plants adapt to their growth conditions. Stress tolerance traits have a high genetic complexity as plants are constantly, and unavoidably, exposed to numerous stress factors, which limits their growth rates in the natural environment. Arabidopsis thaliana, with its broad genetic diversity and wide climatic range, has been shown to successfully adapt to stressful conditions to ensure the completion of its life cycle. As a result, A. thaliana has become a robust and renowned plant model system for studying natural variation and conducting gene discovery studies. Genome wide association studies (GWAS) in restructured populations combining natural and recombinant lines is a particularly effective way to identify the genetic basis of complex traits. As most abiotic stresses affect photosynthetic activity, chlorophyll fluorescence measurements are a potential phenotyping technique for monitoring plant performance under stress conditions. This review focuses on the use of chlorophyll fluorescence as a tool to study genetic variation underlying the stress tolerance responses to abiotic stress in A. thaliana.

  19. The Utility of Next Generation Sequencing in Gene Discovery for Mutation-negative Patients with Rett Syndrome

    Directory of Open Access Journals (Sweden)

    Wendy Anne Gold

    2015-07-01

    Full Text Available Rett syndrome (RTT is a rare, severe disorder of neuronal plasticity that predominantly affects girls. Girls with RTT usually appear asymptomatic in the first 6-18 months of life, but gradually develop severe motor, cognitive and behavioural abnormalities that persist for life. A predominance of neuronal and synaptic dysfunction, with altered excitatory-inhibitory neuronal synaptic transmission and synaptic plasticity are overarching features of RTT in children and in mouse models. Approximately 95% of patients with classical RTT have mutations in the X-linked methyl-CpG-binding (MECP2 gene, whilst other genes, including cyclin-dependent kinase-like 5 (CDKL5, Forkhead box protein G1 (FOXG1, Myocyte-specific enhancer factor 2C (MEF2C and Transcription factor 4 (TCF4, have been associated with phenotypes overlapping with RTT. However, there remain a proportion of patients who carry a clinical diagnosis of RTT, but who are mutation negative. In recent years, next-generation sequencing (NGS technologies have revolutionized approaches to genetic studies, making whole-exome and even whole-genome sequencing possible strategies for the detection of rare and de novo mutations, aiding the discovery of novel disease genes. Here, we review the recent progress that is emerging in identifying pathogenic variations, specifically from exome sequencing in RTT patients, and emphasize the need for the use of this technology to identify known and new disease genes in RTT patients.

  20. Down-Regulation of Gene Expression by RNA-Induced Gene Silencing

    Science.gov (United States)

    Travella, Silvia; Keller, Beat

    Down-regulation of endogenous genes via post-transcriptional gene silencing (PTGS) is a key to the characterization of gene function in plants. Many RNA-based silencing mechanisms such as post-transcriptional gene silencing, co-suppression, quelling, and RNA interference (RNAi) have been discovered among species of different kingdoms (plants, fungi, and animals). One of the most interesting discoveries was RNAi, a sequence-specific gene-silencing mechanism initiated by the introduction of double-stranded RNA (dsRNA), homologous in sequence to the silenced gene, which triggers degradation of mRNA. Infection of plants with modified viruses can also induce RNA silencing and is referred to as virus-induced gene silencing (VIGS). In contrast to insertional mutagenesis, these emerging new reverse genetic approaches represent a powerful tool for exploring gene function and for manipulating gene expression experimentally in cereal species such as barley and wheat. We examined how RNAi and VIGS have been used to assess gene function in barley and wheat, including molecular mechanisms involved in the process and available methodological elements, such as vectors, inoculation procedures, and analysis of silenced phenotypes.

  1. AutoDrug: fully automated macromolecular crystallography workflows for fragment-based drug discovery

    International Nuclear Information System (INIS)

    Tsai, Yingssu; McPhillips, Scott E.; González, Ana; McPhillips, Timothy M.; Zinn, Daniel; Cohen, Aina E.; Feese, Michael D.; Bushnell, David; Tiefenbrunn, Theresa; Stout, C. David; Ludaescher, Bertram; Hedman, Britt; Hodgson, Keith O.; Soltis, S. Michael

    2013-01-01

    New software has been developed for automating the experimental and data-processing stages of fragment-based drug discovery at a macromolecular crystallography beamline. A new workflow-automation framework orchestrates beamline-control and data-analysis software while organizing results from multiple samples. AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallography steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data, performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and successfully

  2. Context based service discovery in unmanaged networks using MDNS/DNS-SD

    NARCIS (Netherlands)

    Stolikj, M.; Cuijpers, P.J.L.; Lukkien, J.J.; Buchina, N.; Bellido, F.J.; Vun, N.C.H.; Dolar, C.; Diaz-Sanchez, D.; Ling, W.-K.

    2016-01-01

    We propose an extension of the mDNS/DNS-SD service discovery protocol, which enables service clients to discover and select services based on their context. The extension improves scalability in large networks, which is of particular importance in future Internet of Things deployments.

  3. Harvest: a web-based biomedical data discovery and reporting application development platform.

    Science.gov (United States)

    Italia, Michael J; Pennington, Jeffrey W; Ruth, Byron; Wrazien, Stacey; Loutrel, Jennifer G; Crenshaw, E Bryan; Miller, Jeffrey; White, Peter S

    2013-01-01

    Biomedical researchers share a common challenge of making complex data understandable and accessible. This need is increasingly acute as investigators seek opportunities for discovery amidst an exponential growth in the volume and complexity of laboratory and clinical data. To address this need, we developed Harvest, an open source framework that provides a set of modular components to aid the rapid development and deployment of custom data discovery software applications. Harvest incorporates visual representations of multidimensional data types in an intuitive, web-based interface that promotes a real-time, iterative approach to exploring complex clinical and experimental data. The Harvest architecture capitalizes on standards-based, open source technologies to address multiple functional needs critical to a research and development environment, including domain-specific data modeling, abstraction of complex data models, and a customizable web client.

  4. Mass Spectrometry-Based Proteomics in Molecular Diagnostics: Discovery of Cancer Biomarkers Using Tissue Culture

    Directory of Open Access Journals (Sweden)

    Debasish Paul

    2013-01-01

    Full Text Available Accurate diagnosis and proper monitoring of cancer patients remain a key obstacle for successful cancer treatment and prevention. Therein comes the need for biomarker discovery, which is crucial to the current oncological and other clinical practices having the potential to impact the diagnosis and prognosis. In fact, most of the biomarkers have been discovered utilizing the proteomics-based approaches. Although high-throughput mass spectrometry-based proteomic approaches like SILAC, 2D-DIGE, and iTRAQ are filling up the pitfalls of the conventional techniques, still serum proteomics importunately poses hurdle in overcoming a wide range of protein concentrations, and also the availability of patient tissue samples is a limitation for the biomarker discovery. Thus, researchers have looked for alternatives, and profiling of candidate biomarkers through tissue culture of tumor cell lines comes up as a promising option. It is a rich source of tumor cell-derived proteins, thereby, representing a wide array of potential biomarkers. Interestingly, most of the clinical biomarkers in use today (CA 125, CA 15.3, CA 19.9, and PSA were discovered through tissue culture-based system and tissue extracts. This paper tries to emphasize the tissue culture-based discovery of candidate biomarkers through various mass spectrometry-based proteomic approaches.

  5. Mass Spectrometry-Based Proteomics in Molecular Diagnostics: Discovery of Cancer Biomarkers Using Tissue Culture

    Science.gov (United States)

    Paul, Debasish; Kumar, Avinash; Gajbhiye, Akshada; Santra, Manas K.; Srikanth, Rapole

    2013-01-01

    Accurate diagnosis and proper monitoring of cancer patients remain a key obstacle for successful cancer treatment and prevention. Therein comes the need for biomarker discovery, which is crucial to the current oncological and other clinical practices having the potential to impact the diagnosis and prognosis. In fact, most of the biomarkers have been discovered utilizing the proteomics-based approaches. Although high-throughput mass spectrometry-based proteomic approaches like SILAC, 2D-DIGE, and iTRAQ are filling up the pitfalls of the conventional techniques, still serum proteomics importunately poses hurdle in overcoming a wide range of protein concentrations, and also the availability of patient tissue samples is a limitation for the biomarker discovery. Thus, researchers have looked for alternatives, and profiling of candidate biomarkers through tissue culture of tumor cell lines comes up as a promising option. It is a rich source of tumor cell-derived proteins, thereby, representing a wide array of potential biomarkers. Interestingly, most of the clinical biomarkers in use today (CA 125, CA 15.3, CA 19.9, and PSA) were discovered through tissue culture-based system and tissue extracts. This paper tries to emphasize the tissue culture-based discovery of candidate biomarkers through various mass spectrometry-based proteomic approaches. PMID:23586059

  6. A hybrid computational method for the discovery of novel reproduction-related genes.

    Science.gov (United States)

    Chen, Lei; Chu, Chen; Kong, Xiangyin; Huang, Guohua; Huang, Tao; Cai, Yu-Dong

    2015-01-01

    Uncovering the molecular mechanisms underlying reproduction is of great importance to infertility treatment and to the generation of healthy offspring. In this study, we discovered novel reproduction-related genes with a hybrid computational method, integrating three different types of method, which offered new clues for further reproduction research. This method was first executed on a weighted graph, constructed based on known protein-protein interactions, to search the shortest paths connecting any two known reproduction-related genes. Genes occurring in these paths were deemed to have a special relationship with reproduction. These newly discovered genes were filtered with a randomization test. Then, the remaining genes were further selected according to their associations with known reproduction-related genes measured by protein-protein interaction score and alignment score obtained by BLAST. The in-depth analysis of the high confidence novel reproduction genes revealed hidden mechanisms of reproduction and provided guidelines for further experimental validations.

  7. Data Mining and Knowledge Discovery via Logic-Based Methods

    CERN Document Server

    Triantaphyllou, Evangelos

    2010-01-01

    There are many approaches to data mining and knowledge discovery (DM&KD), including neural networks, closest neighbor methods, and various statistical methods. This monograph, however, focuses on the development and use of a novel approach, based on mathematical logic, that the author and his research associates have worked on over the last 20 years. The methods presented in the book deal with key DM&KD issues in an intuitive manner and in a natural sequence. Compared to other DM&KD methods, those based on mathematical logic offer a direct and often intuitive approach for extracting easily int

  8. Cancer Biomarker Discovery: Lectin-Based Strategies Targeting Glycoproteins

    Directory of Open Access Journals (Sweden)

    David Clark

    2012-01-01

    Full Text Available Biomarker discovery can identify molecular markers in various cancers that can be used for detection, screening, diagnosis, and monitoring of disease progression. Lectin-affinity is a technique that can be used for the enrichment of glycoproteins from a complex sample, facilitating the discovery of novel cancer biomarkers associated with a disease state.

  9. Discovery of inhibitors of bacterial histidine kinases

    NARCIS (Netherlands)

    Velikova, N.R.

    2014-01-01

    Discovery of Inhibitors of Bacterial Histidine Kinases Summary

    The thesis is on novel antibacterial drug discovery (http://youtu.be/NRMWOGgeysM). Using structure-based and fragment-based drug discovery approach, we have identified small-molecule histidine-kinase

  10. Sequence-Based Discovery Demonstrates That Fixed Light Chain Human Transgenic Rats Produce a Diverse Repertoire of Antigen-Specific Antibodies

    Directory of Open Access Journals (Sweden)

    Katherine E. Harris

    2018-04-01

    Full Text Available We created a novel transgenic rat that expresses human antibodies comprising a diverse repertoire of heavy chains with a single common rearranged kappa light chain (IgKV3-15-JK1. This fixed light chain animal, called OmniFlic, presents a unique system for human therapeutic antibody discovery and a model to study heavy chain repertoire diversity in the context of a constant light chain. The purpose of this study was to analyze heavy chain variable gene usage, clonotype diversity, and to describe the sequence characteristics of antigen-specific monoclonal antibodies (mAbs isolated from immunized OmniFlic animals. Using next-generation sequencing antibody repertoire analysis, we measured heavy chain variable gene usage and the diversity of clonotypes present in the lymph node germinal centers of 75 OmniFlic rats immunized with 9 different protein antigens. Furthermore, we expressed 2,560 unique heavy chain sequences sampled from a diverse set of clonotypes as fixed light chain antibody proteins and measured their binding to antigen by ELISA. Finally, we measured patterns and overall levels of somatic hypermutation in the full B-cell repertoire and in the 2,560 mAbs tested for binding. The results demonstrate that OmniFlic animals produce an abundance of antigen-specific antibodies with heavy chain clonotype diversity that is similar to what has been described with unrestricted light chain use in mammals. In addition, we show that sequence-based discovery is a highly effective and efficient way to identify a large number of diverse monoclonal antibodies to a protein target of interest.

  11. An explanation of resisted discoveries based on construal-level theory.

    Science.gov (United States)

    Fang, Hui

    2015-02-01

    New discoveries and theories are crucial for the development of science, but they are often initially resisted by the scientific community. This paper analyses resistance to scientific discoveries that supplement previous research results or conclusions with new phenomena, such as long chains in macromolecules, Alfvén waves, parity nonconservation in weak interactions and quasicrystals. Construal-level theory is used to explain that the probability of new discoveries may be underestimated because of psychological distance. Thus, the insufficiently examined scope of an accepted theory may lead to overstating the suitable scope and underestimating the probability of its undiscovered counter-examples. Therefore, psychological activity can result in people instinctively resisting new discoveries. Direct evidence can help people judge the validity of a hypothesis with rational thinking. The effects of authorities and textbooks on the resistance to discoveries are also discussed. From the results of our analysis, suggestions are provided to reduce resistance to real discoveries, which will benefit the development of science.

  12. A Nonlinear Model for Gene-Based Gene-Environment Interaction

    Directory of Open Access Journals (Sweden)

    Jian Sa

    2016-06-01

    Full Text Available A vast amount of literature has confirmed the role of gene-environment (G×E interaction in the etiology of complex human diseases. Traditional methods are predominantly focused on the analysis of interaction between a single nucleotide polymorphism (SNP and an environmental variable. Given that genes are the functional units, it is crucial to understand how gene effects (rather than single SNP effects are influenced by an environmental variable to affect disease risk. Motivated by the increasing awareness of the power of gene-based association analysis over single variant based approach, in this work, we proposed a sparse principle component regression (sPCR model to understand the gene-based G×E interaction effect on complex disease. We first extracted the sparse principal components for SNPs in a gene, then the effect of each principal component was modeled by a varying-coefficient (VC model. The model can jointly model variants in a gene in which their effects are nonlinearly influenced by an environmental variable. In addition, the varying-coefficient sPCR (VC-sPCR model has nice interpretation property since the sparsity on the principal component loadings can tell the relative importance of the corresponding SNPs in each component. We applied our method to a human birth weight dataset in Thai population. We analyzed 12,005 genes across 22 chromosomes and found one significant interaction effect using the Bonferroni correction method and one suggestive interaction. The model performance was further evaluated through simulation studies. Our model provides a system approach to evaluate gene-based G×E interaction.

  13. Systematic identification of latent disease-gene associations from PubMed articles.

    Science.gov (United States)

    Zhang, Yuji; Shen, Feichen; Mojarad, Majid Rastegar; Li, Dingcheng; Liu, Sijia; Tao, Cui; Yu, Yue; Liu, Hongfang

    2018-01-01

    Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research.

  14. The pivotal role of multimodality reporter sensors in drug discovery: from cell based assays to real time molecular imaging.

    Science.gov (United States)

    Ray, Pritha

    2011-04-01

    Development and marketing of new drugs require stringent validation that are expensive and time consuming. Non-invasive multimodality molecular imaging using reporter genes holds great potential to expedite these processes at reduced cost. New generations of smarter molecular imaging strategies such as Split reporter, Bioluminescence resonance energy transfer, Multimodality fusion reporter technologies will further assist to streamline and shorten the drug discovery and developmental process. This review illustrates the importance and potential of molecular imaging using multimodality reporter genes in drug development at preclinical phases.

  15. Comparison between project-based learning and discovery learning toward students' metacognitive strategies on global warming concept

    Science.gov (United States)

    Tumewu, Widya Anjelia; Wulan, Ana Ratna; Sanjaya, Yayan

    2017-05-01

    The purpose of this study was to know comparing the effectiveness of learning using Project-based learning (PjBL) and Discovery Learning (DL) toward students metacognitive strategies on global warming concept. A quasi-experimental research design with a The Matching-Only Pretest-Posttest Control Group Design was used in this study. The subjects were students of two classes 7th grade of one of junior high school in Bandung City, West Java of 2015/2016 academic year. The study was conducted on two experimental class, that were project-based learning treatment on the experimental class I and discovery learning treatment was done on the experimental class II. The data was collected through questionnaire to know students metacognitive strategies. The statistical analysis showed that there were statistically significant differences in students metacognitive strategies between project-based learning and discovery learning.

  16. An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework.

    Science.gov (United States)

    Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji

    2016-01-01

    Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org. © The Author(s) 2016. Published by Oxford University Press.

  17. Orphan diseases: state of the drug discovery art.

    Science.gov (United States)

    Volmar, Claude-Henry; Wahlestedt, Claes; Brothers, Shaun P

    2017-06-01

    Since 1983 more than 300 drugs have been developed and approved for orphan diseases. However, considering the development of novel diagnosis tools, the number of rare diseases vastly outpaces therapeutic discovery. Academic centers and nonprofit institutes are now at the forefront of rare disease R&D, partnering with pharmaceutical companies when academic researchers discover novel drugs or targets for specific diseases, thus reducing the failure risk and cost for pharmaceutical companies. Considerable progress has occurred in the art of orphan drug discovery, and a symbiotic relationship now exists between pharmaceutical industry, academia, and philanthropists that provides a useful framework for orphan disease therapeutic discovery. Here, the current state-of-the-art of drug discovery for orphan diseases is reviewed. Current technological approaches and challenges for drug discovery are considered, some of which can present somewhat unique challenges and opportunities in orphan diseases, including the potential for personalized medicine, gene therapy, and phenotypic screening.

  18. Gene2Function: An Integrated Online Resource for Gene Function Discovery

    Directory of Open Access Journals (Sweden)

    Yanhui Hu

    2017-08-01

    Full Text Available One of the most powerful ways to develop hypotheses regarding the biological functions of conserved genes in a given species, such as humans, is to first look at what is known about their function in another species. Model organism databases and other resources are rich with functional information but difficult to mine. Gene2Function addresses a broad need by integrating information about conserved genes in a single online resource.

  19. Discovery of Conservation and Diversification of miR171 Genes by Phylogenetic Analysis based on Global Genomes

    Directory of Open Access Journals (Sweden)

    Xudong Zhu

    2015-07-01

    Full Text Available The microRNA171 (miR171 family is widely distributed and highly conserved in a range of species and plays critical roles in regulating plant growth and development through repressing expression of ( transcription factors. However, information on the evolutionary conservation and functional diversification of the miRNA171 family members remains scanty. We reconstructed the phylogenetic relationships among miR171 precursor and mature sequences so as to investigate the extent and degree of evolutionary conservation of miR171 in (L. Heynh. (ath, grape ( L. (vvi, poplar ( Torr. & A.Gray ex Hook. (ptc, and rice ( L. (osa. Despite strong conservation of over 80%, some mature miR171 sequences, such as , and and , -, and -, have undergone critical sequence variation, leading to functional diversification, since they target non gene transcript(s. Phylogenetic analyses revealed a combination of old ancestral relationships and recent lineage-specific diversification in the miR171 family within the four model plants. The -regulatory motifs on the upstream promoter sequences of genes were highly divergent and shared some similar elements, indicating their possible contribution to the functional variation observed within the miR171 family. This study will buttress our understanding of the functional differentiation of miRNAs and the relationships of miRNA–target pairs based on the evolutionary history of genes.

  20. Solutions for wood-based bio-energy price discovery

    Energy Technology Data Exchange (ETDEWEB)

    Teraes, Timo [FOEX Indexes Ltd., Helsinki (Finland)], e-mail: timo@foex.fi

    2012-11-01

    Energy prices are highly volatile. This volatility can have serious ill-effects on the profitability of companies engaged in the energy business. There are, however, a number of price risk management tools which can be used to reduce the problems caused by price volatility. International trade of wood pellets and wood chips is rapidly growing. A good price transparency helps in developing the trade further. In order to meet the renewable energy targets within the EU, further growth of volumes is needed, at least within Europe and from overseas supply sources to the European markets. Reliable price indices are a central element in price risk management and in general price discovery. Exchanges have provided, in the past, the most widely known price discovery systems. Since 1990's, an increasing number of price risk management tools has been based on cash settlement concept. Cash settlement requires high quality benchmark price indices. These have been developed by the exchanges themselves, by trade press and by independent price benchmark provider companies. The best known of these benchmarks in forest industry and now also in wood-based bioenergy products are the PIX indices, provided by FOEX Indexes Ltd. This presentation discusses the key requirements for a good price index and the different ways of using the indices. Price relationships between wood chip prices and pellet prices are also discussed as will be the outlook for the future volume growth and trade flows in woodchips and pellets mainly from the European perspective.

  1. Computational methods in drug discovery

    Directory of Open Access Journals (Sweden)

    Sumudu P. Leelananda

    2016-12-01

    Full Text Available The process for drug discovery and development is challenging, time consuming and expensive. Computer-aided drug discovery (CADD tools can act as a virtual shortcut, assisting in the expedition of this long process and potentially reducing the cost of research and development. Today CADD has become an effective and indispensable tool in therapeutic development. The human genome project has made available a substantial amount of sequence data that can be used in various drug discovery projects. Additionally, increasing knowledge of biological structures, as well as increasing computer power have made it possible to use computational methods effectively in various phases of the drug discovery and development pipeline. The importance of in silico tools is greater than ever before and has advanced pharmaceutical research. Here we present an overview of computational methods used in different facets of drug discovery and highlight some of the recent successes. In this review, both structure-based and ligand-based drug discovery methods are discussed. Advances in virtual high-throughput screening, protein structure prediction methods, protein–ligand docking, pharmacophore modeling and QSAR techniques are reviewed.

  2. Toxins and drug discovery.

    Science.gov (United States)

    Harvey, Alan L

    2014-12-15

    Components from venoms have stimulated many drug discovery projects, with some notable successes. These are briefly reviewed, from captopril to ziconotide. However, there have been many more disappointments on the road from toxin discovery to approval of a new medicine. Drug discovery and development is an inherently risky business, and the main causes of failure during development programmes are outlined in order to highlight steps that might be taken to increase the chances of success with toxin-based drug discovery. These include having a clear focus on unmet therapeutic needs, concentrating on targets that are well-validated in terms of their relevance to the disease in question, making use of phenotypic screening rather than molecular-based assays, and working with development partners with the resources required for the long and expensive development process. Copyright © 2014 The Author. Published by Elsevier Ltd.. All rights reserved.

  3. Leveraging gene-environment interactions and endotypes for asthma gene discovery

    DEFF Research Database (Denmark)

    Bønnelykke, Klaus; Ober, Carole

    2016-01-01

    , such as childhood asthma with severe exacerbations, and on relevant exposures that are involved in gene-environment interactions (GEIs), such as rhinovirus infections, will improve detection of asthma genes and our understanding of the underlying mechanisms. We will discuss the challenges of considering GEIs......Asthma is a heterogeneous clinical syndrome that includes subtypes of disease with different underlying causes and disease mechanisms. Asthma is caused by a complex interaction between genes and environmental exposures; early-life exposures in particular play an important role. Asthma is also...... heritable, and a number of susceptibility variants have been discovered in genome-wide association studies, although the known risk alleles explain only a small proportion of the heritability. In this review, we present evidence supporting the hypothesis that focusing on more specific asthma phenotypes...

  4. Fragment-Based Drug Discovery in the Bromodomain and Extra-Terminal Domain Family.

    Science.gov (United States)

    Radwan, Mostafa; Serya, Rabah

    2017-08-01

    Bromodomain and extra-terminal domain (BET) inhibition has emerged recently as a potential therapeutic target for the treatment of many human disorders such as atherosclerosis, inflammatory disorders, chronic obstructive pulmonary disease (COPD), some viral infections, and cancer. Since the discovery of the two potent inhibitors, I-BET762 and JQ1, different research groups have used different techniques to develop novel potent and selective inhibitors. In this review, we will be concerned with the trials that used fragment-based drug discovery (FBDD) approaches to discover or optimize BET inhibitors, also showing fragments that can be further optimized in future projects to reach novel potent BET inhibitors. © 2017 Deutsche Pharmazeutische Gesellschaft.

  5. Application of mass spectrometry-based proteomics for biomarker discovery in neurological disorders

    Directory of Open Access Journals (Sweden)

    Venugopal Abhilash

    2009-01-01

    Full Text Available Mass spectrometry-based quantitative proteomics has emerged as a powerful approach that has the potential to accelerate biomarker discovery, both for diagnostic as well as therapeutic purposes. Proteomics has traditionally been synonymous with 2D gels but is increasingly shifting to the use of gel-free systems and liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS. Quantitative proteomic approaches have already been applied to investigate various neurological disorders, especially in the context of identifying biomarkers from cerebrospinal fluid and serum. This review highlights the scope of different applications of quantitative proteomics in understanding neurological disorders with special emphasis on biomarker discovery.

  6. A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool.

    Science.gov (United States)

    Mazandu, Gaston K; Chimusa, Emile R; Mbiyavanga, Mamana; Mulder, Nicola J

    2016-02-01

    Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is a portable software package integrating all known GO information content-based semantic similarity measures and relevant biological applications associated with these measures. A-DaGO-Fun has the advantage not only of handling datasets from the current high-throughput genome-wide applications, but also allowing users to choose the most relevant semantic similarity approach for their biological applications and to adapt a given module to their needs. A-DaGO-Fun is freely available to the research community at http://web.cbio.uct.ac.za/ITGOM/adagofun. It is implemented in Linux using Python under free software (GNU General Public Licence). gmazandu@cbio.uct.ac.za or Nicola.Mulder@uct.ac.za Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Computational and Experimental Approaches to Cancer Biomarker Discovery

    DEFF Research Database (Denmark)

    Krzystanek, Marcin

    of a patient’s response to a particular treatment, thus helping to avoid unnecessary treatment and unwanted side effects in non-responding individuals.Currently biomarker discovery is facilitated by recent advances in high-throughput technologies when association between a given biological phenotype...... and the state or level of a large number of molecular entities is investigated. Such associative analysis could be confounded by several factors, leading to false discoveries. For example, it is assumed that with the exception of the true biomarkers most molecular entities such as gene expression levels show...... random distribution in a given cohort. However, gene expression levels may also be affected by technical bias when the actual measurement technology or sample handling may introduce a systematic error. If the distribution of systematic errors correlates with the biological phenotype then the risk...

  8. A powerful score-based test statistic for detecting gene-gene co-association.

    Science.gov (United States)

    Xu, Jing; Yuan, Zhongshang; Ji, Jiadong; Zhang, Xiaoshuai; Li, Hongkai; Wu, Xuesen; Xue, Fuzhong; Liu, Yanxun

    2016-01-29

    The genetic variants identified by Genome-wide association study (GWAS) can only account for a small proportion of the total heritability for complex disease. The existence of gene-gene joint effects which contains the main effects and their co-association is one of the possible explanations for the "missing heritability" problems. Gene-gene co-association refers to the extent to which the joint effects of two genes differ from the main effects, not only due to the traditional interaction under nearly independent condition but the correlation between genes. Generally, genes tend to work collaboratively within specific pathway or network contributing to the disease and the specific disease-associated locus will often be highly correlated (e.g. single nucleotide polymorphisms (SNPs) in linkage disequilibrium). Therefore, we proposed a novel score-based statistic (SBS) as a gene-based method for detecting gene-gene co-association. Various simulations illustrate that, under different sample sizes, marginal effects of causal SNPs and co-association levels, the proposed SBS has the better performance than other existed methods including single SNP-based and principle component analysis (PCA)-based logistic regression model, the statistics based on canonical correlations (CCU), kernel canonical correlation analysis (KCCU), partial least squares path modeling (PLSPM) and delta-square (δ (2)) statistic. The real data analysis of rheumatoid arthritis (RA) further confirmed its advantages in practice. SBS is a powerful and efficient gene-based method for detecting gene-gene co-association.

  9. Identifying candidate driver genes by integrative ovarian cancer genomics data

    Science.gov (United States)

    Lu, Xinguo; Lu, Jibo

    2017-08-01

    Integrative analysis of molecular mechanics underlying cancer can distinguish interactions that cannot be revealed based on one kind of data for the appropriate diagnosis and treatment of cancer patients. Tumor samples exhibit heterogeneity in omics data, such as somatic mutations, Copy Number Variations CNVs), gene expression profiles and so on. In this paper we combined gene co-expression modules and mutation modulators separately in tumor patients to obtain the candidate driver genes for resistant and sensitive tumor from the heterogeneous data. The final list of modulators identified are well known in biological processes associated with ovarian cancer, such as CCL17, CACTIN, CCL16, CCL22, APOB, KDF1, CCL11, HNF1B, LRG1, MED1 and so on, which can help to facilitate the discovery of biomarkers, molecular diagnostics, and drug discovery.

  10. Discovery and validation of information theory-based transcription factor and cofactor binding site motifs.

    Science.gov (United States)

    Lu, Ruipeng; Mucaki, Eliseos J; Rogan, Peter K

    2017-03-17

    Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. An integration of genome-wide association study and gene expression profiling to prioritize the discovery of novel susceptibility Loci for osteoporosis-related traits.

    Directory of Open Access Journals (Sweden)

    Yi-Hsiang Hsu

    2010-06-01

    Full Text Available Osteoporosis is a complex disorder and commonly leads to fractures in elderly persons. Genome-wide association studies (GWAS have become an unbiased approach to identify variations in the genome that potentially affect health. However, the genetic variants identified so far only explain a small proportion of the heritability for complex traits. Due to the modest genetic effect size and inadequate power, true association signals may not be revealed based on a stringent genome-wide significance threshold. Here, we take advantage of SNP and transcript arrays and integrate GWAS and expression signature profiling relevant to the skeletal system in cellular and animal models to prioritize the discovery of novel candidate genes for osteoporosis-related traits, including bone mineral density (BMD at the lumbar spine (LS and femoral neck (FN, as well as geometric indices of the hip (femoral neck-shaft angle, NSA; femoral neck length, NL; and narrow-neck width, NW. A two-stage meta-analysis of GWAS from 7,633 Caucasian women and 3,657 men, revealed three novel loci associated with osteoporosis-related traits, including chromosome 1p13.2 (RAP1A, p = 3.6x10(-8, 2q11.2 (TBC1D8, and 18q11.2 (OSBPL1A, and confirmed a previously reported region near TNFRSF11B/OPG gene. We also prioritized 16 suggestive genome-wide significant candidate genes based on their potential involvement in skeletal metabolism. Among them, 3 candidate genes were associated with BMD in women. Notably, 2 out of these 3 genes (GPR177, p = 2.6x10(-13; SOX6, p = 6.4x10(-10 associated with BMD in women have been successfully replicated in a large-scale meta-analysis of BMD, but none of the non-prioritized candidates (associated with BMD did. Our results support the concept of our prioritization strategy. In the absence of direct biological support for identified genes, we highlighted the efficiency of subsequent functional characterization using publicly available expression profiling relevant

  12. Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison.

    Science.gov (United States)

    Kazemian, Majid; Zhu, Qiyun; Halfon, Marc S; Sinha, Saurabh

    2011-12-01

    Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, 'enhancers'), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM discovery are thus a critically needed complement to empirical approaches. However, existing computational methods that search for clusters of putative binding sites are ineffective if the relevant TFs and/or their binding specificities are unknown. Here, we provide a significantly improved method for 'motif-blind' CRM discovery that does not depend on knowledge or accurate prediction of TF-binding motifs and is effective when limited knowledge of functional CRMs is available to 'supervise' the search. We propose a new statistical method, based on 'Interpolated Markov Models', for motif-blind, genome-wide CRM discovery. It captures the statistical profile of variable length words in known CRMs of a regulatory network and finds candidate CRMs that match this profile. The method also uses orthologs of the known CRMs from closely related genomes. We perform in silico evaluation of predicted CRMs by assessing whether their neighboring genes are enriched for the expected expression patterns. This assessment uses a novel statistical test that extends the widely used Hypergeometric test of gene set enrichment to account for variability in intergenic lengths. We find that the new CRM prediction method is superior to existing methods. Finally, we experimentally validate 12 new CRM predictions by examining their regulatory activity in vivo in Drosophila; 10 of the tested CRMs were found to be functional, while 6 of the top 7 predictions showed the expected activity patterns. We make our program available as downloadable source code, and as a plugin for a genome browser installed on our servers. © The Author(s) 2011. Published by Oxford University Press.

  13. Graph-Based Methods for Discovery Browsing with Semantic Predications

    DEFF Research Database (Denmark)

    Wilkowski, Bartlomiej; Fiszman, Marcelo; Miller, Christopher M

    2011-01-01

    . Poorly understood relationships may be explored through novel points of view, and potentially interesting relationships need not be known ahead of time. In a process of "cooperative reciprocity" the user iteratively focuses system output, thus controlling the large number of relationships often generated...... in literature-based discovery systems. The underlying technology exploits SemRep semantic predications represented as a graph of interconnected nodes (predication arguments) and edges (predicates). The system suggests paths in this graph, which represent chains of relationships. The methodology is illustrated...

  14. The Genetics of Obsessive-Compulsive Disorder and Tourette Syndrome: An Epidemiological and Pathway-Based Approach for Gene Discovery

    Science.gov (United States)

    Grados, Marco A.

    2010-01-01

    Objective: To provide a contemporary perspective on genetic discovery methods applied to obsessive-compulsive disorder (OCD) and Tourette syndrome (TS). Method: A review of research trends in genetics research in OCD and TS is conducted, with emphasis on novel approaches. Results: Genome-wide association studies (GWAS) are now in progress in OCD…

  15. Combining NMR and X-ray crystallography in fragment-based drug discovery: discovery of highly potent and selective BACE-1 inhibitors.

    Science.gov (United States)

    Wyss, Daniel F; Wang, Yu-Sen; Eaton, Hugh L; Strickland, Corey; Voigt, Johannes H; Zhu, Zhaoning; Stamford, Andrew W

    2012-01-01

    Fragment-based drug discovery (FBDD) has become increasingly popular over the last decade. We review here how we have used highly structure-driven fragment-based approaches to complement more traditional lead discovery to tackle high priority targets and those struggling for leads. Combining biomolecular nuclear magnetic resonance (NMR), X-ray crystallography, and molecular modeling with structure-assisted chemistry and innovative biology as an integrated approach for FBDD can solve very difficult problems, as illustrated in this chapter. Here, a successful FBDD campaign is described that has allowed the development of a clinical candidate for BACE-1, a challenging CNS drug target. Crucial to this achievement were the initial identification of a ligand-efficient isothiourea fragment through target-based NMR screening and the determination of its X-ray crystal structure in complex with BACE-1, which revealed an extensive H-bond network with the two active site aspartate residues. This detailed 3D structural information then enabled the design and validation of novel, chemically stable and accessible heterocyclic acylguanidines as aspartic acid protease inhibitor cores. Structure-assisted fragment hit-to-lead optimization yielded iminoheterocyclic BACE-1 inhibitors that possess desirable molecular properties as potential therapeutic agents to test the amyloid hypothesis of Alzheimer's disease in a clinical setting.

  16. Rule Induction-Based Knowledge Discovery for Energy Efficiency

    OpenAIRE

    Chen, Qipeng; Fan, Zhong; Kaleshi, Dritan; Armour, Simon M D

    2015-01-01

    Rule induction is a practical approach to knowledge discovery. Provided that a problem is developed, rule induction is able to return the knowledge that addresses the goal of this problem as if-then rules. The primary goals of knowledge discovery are for prediction and description. The rule format knowledge representation is easily understandable so as to enable users to make decisions. This paper presents the potential of rule induction for energy efficiency. In particular, three rule induct...

  17. NMR approaches in structure-based lead discovery: recent developments and new frontiers for targeting multi-protein complexes.

    Science.gov (United States)

    Dias, David M; Ciulli, Alessio

    2014-01-01

    Nuclear magnetic resonance (NMR) spectroscopy is a pivotal method for structure-based and fragment-based lead discovery because it is one of the most robust techniques to provide information on protein structure, dynamics and interaction at an atomic level in solution. Nowadays, in most ligand screening cascades, NMR-based methods are applied to identify and structurally validate small molecule binding. These can be high-throughput and are often used synergistically with other biophysical assays. Here, we describe current state-of-the-art in the portfolio of available NMR-based experiments that are used to aid early-stage lead discovery. We then focus on multi-protein complexes as targets and how NMR spectroscopy allows studying of interactions within the high molecular weight assemblies that make up a vast fraction of the yet untargeted proteome. Finally, we give our perspective on how currently available methods could build an improved strategy for drug discovery against such challenging targets. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.

  18. Discovery and replication of gene influences on brain structure using LASSO regression

    Directory of Open Access Journals (Sweden)

    Omid eKohannim

    2012-08-01

    Full Text Available We implemented LASSO (least absolute shrinkage and selection operator regression to evaluate gene effects in genome-wide association studies (GWAS of brain images, using an MRI-derived temporal lobe volume measure from 729 subjects scanned as part of the Alzheimer’s Disease Neuroimaging Initiative (ADNI. Sparse groups of SNPs in individual genes were selected by LASSO, which identifies efficient sets of variants influencing the data. These SNPs were considered jointly when assessing their association with neuroimaging measures. We discovered 22 genes that passed genome-wide significance for influencing temporal lobe volume. This was a substantially greater number of significant genes compared to those found with standard, univariate GWAS. These top genes are all expressed in the brain and include genes previously related to brain function or neuropsychiatric disorders such as MACROD2, SORCS2, GRIN2B, MAGI2, NPAS3, CLSTN2, GABRG3, NRXN3, PRKAG2, GAS7, RBFOX1, ADARB2, CHD4 and CDH13. The top genes we identified with this method also displayed significant and widespread post-hoc effects on voxelwise, tensor-based morphometry (TBM maps of the temporal lobes. The most significantly associated gene was an autism susceptibility gene known as MACROD2. We were able to successfully replicate the effect of the MACROD2 gene in an independent cohort of 564 young, Australian healthy adult twins and siblings scanned with MRI (mean age: 23.8±2.2 SD years. In exploratory analyses, three selected SNPs in the MACROD2 gene were also significantly associated with performance intelligence quotient (PIQ. Our approach powerfully complements univariate techniques in detecting influences of genes on the living brain.

  19. Native Mass Spectrometry in Fragment-Based Drug Discovery

    Directory of Open Access Journals (Sweden)

    Liliana Pedro

    2016-07-01

    Full Text Available The advent of native mass spectrometry (MS in 1990 led to the development of new mass spectrometry instrumentation and methodologies for the analysis of noncovalent protein–ligand complexes. Native MS has matured to become a fast, simple, highly sensitive and automatable technique with well-established utility for fragment-based drug discovery (FBDD. Native MS has the capability to directly detect weak ligand binding to proteins, to determine stoichiometry, relative or absolute binding affinities and specificities. Native MS can be used to delineate ligand-binding sites, to elucidate mechanisms of cooperativity and to study the thermodynamics of binding. This review highlights key attributes of native MS for FBDD campaigns.

  20. Native Mass Spectrometry in Fragment-Based Drug Discovery.

    Science.gov (United States)

    Pedro, Liliana; Quinn, Ronald J

    2016-07-28

    The advent of native mass spectrometry (MS) in 1990 led to the development of new mass spectrometry instrumentation and methodologies for the analysis of noncovalent protein-ligand complexes. Native MS has matured to become a fast, simple, highly sensitive and automatable technique with well-established utility for fragment-based drug discovery (FBDD). Native MS has the capability to directly detect weak ligand binding to proteins, to determine stoichiometry, relative or absolute binding affinities and specificities. Native MS can be used to delineate ligand-binding sites, to elucidate mechanisms of cooperativity and to study the thermodynamics of binding. This review highlights key attributes of native MS for FBDD campaigns.

  1. Predicting future discoveries from current scientific literature.

    Science.gov (United States)

    Petrič, Ingrid; Cestnik, Bojan

    2014-01-01

    Knowledge discovery in biomedicine is a time-consuming process starting from the basic research, through preclinical testing, towards possible clinical applications. Crossing of conceptual boundaries is often needed for groundbreaking biomedical research that generates highly inventive discoveries. We demonstrate the ability of a creative literature mining method to advance valuable new discoveries based on rare ideas from existing literature. When emerging ideas from scientific literature are put together as fragments of knowledge in a systematic way, they may lead to original, sometimes surprising, research findings. If enough scientific evidence is already published for the association of such findings, they can be considered as scientific hypotheses. In this chapter, we describe a method for the computer-aided generation of such hypotheses based on the existing scientific literature. Our literature-based discovery of NF-kappaB with its possible connections to autism was recently approved by scientific community, which confirms the ability of our literature mining methodology to accelerate future discoveries based on rare ideas from existing literature.

  2. Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing.

    Science.gov (United States)

    Zhao, Yingwen; Fu, Guangyuan; Wang, Jun; Guo, Maozu; Yu, Guoxian

    2018-02-23

    Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. Direct: Ontology based discovery of responsibility and causality in legal case descriptions

    NARCIS (Netherlands)

    Breuker, J.A.P.J.; Hoekstra, R.J.; Gordon, T.

    2004-01-01

    In this paper we present DIRECT, a system forautomatic discovery of responsibility and causal relations in legal case descriptions based on LRI-Core, a core ontology that covers the main concepts that are common to all legal domains. These domains have a predominant common-sense character - the law

  4. Challenges of the information age: the impact of false discovery on pathway identification.

    Science.gov (United States)

    Rog, Colin J; Chekuri, Srinivasa C; Edgerton, Mary E

    2012-11-21

    Pathways with members that have known relevance to a disease are used to support hypotheses generated from analyses of gene expression and proteomic studies. Using cancer as an example, the pitfalls of searching pathways databases as support for genes and proteins that could represent false discoveries are explored. The frequency with which networks could be generated from 100 instances each of randomly selected five and ten genes sets as input to MetaCore, a commercial pathways database, was measured. A PubMed search enumerated cancer-related literature published for any gene in the networks. Using three, two, and one maximum intervening step between input genes to populate the network, networks were generated with frequencies of 97%, 77%, and 7% using ten gene sets and 73%, 27%, and 1% using five gene sets. PubMed reported an average of 4225 cancer-related articles per network gene. This can be attributed to the richly populated pathways databases and the interest in the molecular basis of cancer. As information sources become enriched, they are more likely to generate plausible mechanisms for false discoveries.

  5. The Discovery of Aurora Kinase Inhibitor by Multi-Docking-Based Virtual Screening

    Directory of Open Access Journals (Sweden)

    Jun-Tae Kim

    2014-11-01

    Full Text Available We report the discovery of aurora kinase inhibitor using the fragment-based virtual screening by multi-docking strategy. Among a number of fragments collected from eMololecules, we found four fragment molecules showing potent activity (>50% at 100 μM against aurora kinase. Based on the explored fragment scaffold, we selected two compounds in our synthesized library and validated the biological activity against Aurora kinase.

  6. De Novo Discovery of Structured ncRNA Motifs in Genomic Sequences

    DEFF Research Database (Denmark)

    Ruzzo, Walter L; Gorodkin, Jan

    2014-01-01

    De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphas...... on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.......De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis...

  7. Genome engineering for microbial natural product discovery.

    Science.gov (United States)

    Choi, Si-Sun; Katsuyama, Yohei; Bai, Linquan; Deng, Zixin; Ohnishi, Yasuo; Kim, Eung-Soo

    2018-03-03

    The discovery and development of microbial natural products (MNPs) have played pivotal roles in the fields of human medicine and its related biotechnology sectors over the past several decades. The post-genomic era has witnessed the development of microbial genome mining approaches to isolate previously unsuspected MNP biosynthetic gene clusters (BGCs) hidden in the genome, followed by various BGC awakening techniques to visualize compound production. Additional microbial genome engineering techniques have allowed higher MNP production titers, which could complement a traditional culture-based MNP chasing approach. Here, we describe recent developments in the MNP research paradigm, including microbial genome mining, NP BGC activation, and NP overproducing cell factory design. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Strategies for enhancing the effectiveness of metagenomic-based enzyme discovery in lignocellulytic microbial communities

    Energy Technology Data Exchange (ETDEWEB)

    DeAngelis, K.M.; Gladden, J.G.; Allgaier, M.; D' haeseleer, P.; Fortney, J.L.; Reddy, A.; Hugenholtz, P.; Singer, S.W.; Vander Gheynst, J.; Silver, W.L.; Simmons, B.; Hazen, T.C.

    2010-03-01

    Producing cellulosic biofuels from plant material has recently emerged as a key U.S. Department of Energy goal. For this technology to be commercially viable on a large scale, it is critical to make production cost efficient by streamlining both the deconstruction of lignocellulosic biomass and fuel production. Many natural ecosystems efficiently degrade lignocellulosic biomass and harbor enzymes that, when identified, could be used to increase the efficiency of commercial biomass deconstruction. However, ecosystems most likely to yield relevant enzymes, such as tropical rain forest soil in Puerto Rico, are often too complex for enzyme discovery using current metagenomic sequencing technologies. One potential strategy to overcome this problem is to selectively cultivate the microbial communities from these complex ecosystems on biomass under defined conditions, generating less complex biomass-degrading microbial populations. To test this premise, we cultivated microbes from Puerto Rican soil or green waste compost under precisely defined conditions in the presence dried ground switchgrass (Panicum virgatum L.) or lignin, respectively, as the sole carbon source. Phylogenetic profiling of the two feedstock-adapted communities using SSU rRNA gene amplicon pyrosequencing or phylogenetic microarray analysis revealed that the adapted communities were significantly simplified compared to the natural communities from which they were derived. Several members of the lignin-adapted and switchgrass-adapted consortia are related to organisms previously characterized as biomass degraders, while others were from less well-characterized phyla. The decrease in complexity of these communities make them good candidates for metagenomic sequencing and will likely enable the reconstruction of a greater number of full length genes, leading to the discovery of novel lignocellulose-degrading enzymes adapted to feedstocks and conditions of interest.

  9. Incorporation of rapid thermodynamic data in fragment-based drug discovery.

    Science.gov (United States)

    Kobe, Akihiro; Caaveiro, Jose M M; Tashiro, Shinya; Kajihara, Daisuke; Kikkawa, Masato; Mitani, Tomoya; Tsumoto, Kouhei

    2013-03-14

    Fragment-based drug discovery (FBDD) has enjoyed increasing popularity in recent years. We introduce SITE (single-injection thermal extinction), a novel thermodynamic methodology that selects high-quality hits early in FBDD. SITE is a fast calorimetric competitive assay suitable for automation that captures the essence of isothermal titration calorimetry but using significantly fewer resources. We describe the principles of SITE and identify a novel family of fragment inhibitors of the enzyme ketosteroid isomerase displaying high values of enthalpic efficiency.

  10. Evaluation of Gene-Based Family-Based Methods to Detect Novel Genes Associated With Familial Late Onset Alzheimer Disease

    Directory of Open Access Journals (Sweden)

    Maria V. Fernández

    2018-04-01

    Full Text Available Gene-based tests to study the combined effect of rare variants on a particular phenotype have been widely developed for case-control studies, but their evolution and adaptation for family-based studies, especially studies of complex incomplete families, has been slower. In this study, we have performed a practical examination of all the latest gene-based methods available for family-based study designs using both simulated and real datasets. We examined the performance of several collapsing, variance-component, and transmission disequilibrium tests across eight different software packages and 22 models utilizing a cohort of 285 families (N = 1,235 with late-onset Alzheimer disease (LOAD. After a thorough examination of each of these tests, we propose a methodological approach to identify, with high confidence, genes associated with the tested phenotype and we provide recommendations to select the best software and model for family-based gene-based analyses. Additionally, in our dataset, we identified PTK2B, a GWAS candidate gene for sporadic AD, along with six novel genes (CHRD, CLCN2, HDLBP, CPAMD8, NLRP9, and MAS1L as candidate genes for familial LOAD.

  11. In silico tools used for compound selection during target-based drug discovery and development.

    Science.gov (United States)

    Caldwell, Gary W

    2015-01-01

    The target-based drug discovery process, including target selection, screening, hit-to-lead (H2L) and lead optimization stage gates, is the most common approach used in drug development. The full integration of in vitro and/or in vivo data with in silico tools across the entire process would be beneficial to R&D productivity by developing effective selection criteria and drug-design optimization strategies. This review focuses on understanding the impact and extent in the past 5 years of in silico tools on the various stage gates of the target-based drug discovery approach. There are a large number of in silico tools available for establishing selection criteria and drug-design optimization strategies in the target-based approach. However, the inconsistent use of in vitro and/or in vivo data integrated with predictive in silico multiparameter models throughout the process is contributing to R&D productivity issues. In particular, the lack of reliable in silico tools at the H2L stage gate is contributing to the suboptimal selection of viable lead compounds. It is suggested that further development of in silico multiparameter models and organizing biologists, medicinal and computational chemists into one team with a single accountable objective to expand the utilization of in silico tools in all phases of drug discovery would improve R&D productivity.

  12. Parallel Density-Based Clustering for Discovery of Ionospheric Phenomena

    Science.gov (United States)

    Pankratius, V.; Gowanlock, M.; Blair, D. M.

    2015-12-01

    Ionospheric total electron content maps derived from global networks of dual-frequency GPS receivers can reveal a plethora of ionospheric features in real-time and are key to space weather studies and natural hazard monitoring. However, growing data volumes from expanding sensor networks are making manual exploratory studies challenging. As the community is heading towards Big Data ionospheric science, automation and Computer-Aided Discovery become indispensable tools for scientists. One problem of machine learning methods is that they require domain-specific adaptations in order to be effective and useful for scientists. Addressing this problem, our Computer-Aided Discovery approach allows scientists to express various physical models as well as perturbation ranges for parameters. The search space is explored through an automated system and parallel processing of batched workloads, which finds corresponding matches and similarities in empirical data. We discuss density-based clustering as a particular method we employ in this process. Specifically, we adapt Density-Based Spatial Clustering of Applications with Noise (DBSCAN). This algorithm groups geospatial data points based on density. Clusters of points can be of arbitrary shape, and the number of clusters is not predetermined by the algorithm; only two input parameters need to be specified: (1) a distance threshold, (2) a minimum number of points within that threshold. We discuss an implementation of DBSCAN for batched workloads that is amenable to parallelization on manycore architectures such as Intel's Xeon Phi accelerator with 60+ general-purpose cores. This manycore parallelization can cluster large volumes of ionospheric total electronic content data quickly. Potential applications for cluster detection include the visualization, tracing, and examination of traveling ionospheric disturbances or other propagating phenomena. Acknowledgments. We acknowledge support from NSF ACI-1442997 (PI V. Pankratius).

  13. IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes.

    Science.gov (United States)

    Hadjithomas, Michalis; Chen, I-Min A; Chu, Ken; Huang, Jinghua; Ratner, Anna; Palaniappan, Krishna; Andersen, Evan; Markowitz, Victor; Kyrpides, Nikos C; Ivanova, Natalia N

    2017-01-04

    Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic gene clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. HPV vaccines: their pathology-based discovery, benefits, and adverse effects.

    Science.gov (United States)

    Nicol, Alcina F; de Andrade, Cecilia V; Russomano, Fabio B; Rodrigues, Luana S L; Oliveira, Nathalia S; Provance, David William; Nuovo, Gerard J

    2015-12-01

    The discovery of the human papillomavirus (HPV) vaccine illustrates the power of in situ-based pathologic analysis in better understanding and curing diseases. The 2 available HPV vaccines have markedly reduced the incidence of cervical intraepithelial neoplasias, genital warts, and cervical cancer throughout the world. Concerns about HPV vaccine safety have led some physicians, health care officials, and parents to refuse providing the recommended vaccination to the target population. The aims of the study were to discuss the discovery of HPV vaccine and review scientific data related to measurable outcomes from the use of HPV vaccines. The strong type-specific immunity against HPV in humans has been known for more than 25 years. Multiple studies confirm the positive risk benefit of HPV vaccination with minimal documented adverse effects. The most common adverse effect, injection site pain, occurred in about 10% of girls and was less than the rate reported for other vaccines. Use of HPV vaccine should be expanded into more diverse populations, mainly in low-resource settings. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Antisense gene silencing

    DEFF Research Database (Denmark)

    Nielsen, Troels T; Nielsen, Jørgen E

    2013-01-01

    Since the first reports that double-stranded RNAs can efficiently silence gene expression in C. elegans, the technology of RNA interference (RNAi) has been intensively exploited as an experimental tool to study gene function. With the subsequent discovery that RNAi could also be applied...

  16. Gene Overexpression Resources in Cereals for Functional Genomics and Discovery of Useful Genes

    Directory of Open Access Journals (Sweden)

    Kiyomi Abe

    2016-09-01

    Full Text Available Identification and elucidation of functions of plant genes is valuable for both basic and applied research. In addition to natural variation in model plants, numerous loss-of-function resources have been produced by mutagenesis with chemicals, irradiation, or insertions of transposable elements or T-DNA. However, we may be unable to observe loss-of-function phenotypes for genes with functionally redundant homologs, and for those essential for growth and development. To offset such disadvantages, gain-of-function transgenic resources have been exploited. Activation-tagged lines have been generated using obligatory overexpression of endogenous genes by random insertion of an enhancer. Recent progress in DNA sequencing technology and bioinformatics has enabled the preparation of genomewide collections of full-length cDNAs (fl-cDNAs in some model species. Using the fl-cDNA clones, a novel gain-of-function strategy, Fl-cDNA OvereXpressor gene (FOX-hunting system, has been developed. A mutant phenotype in a FOX line can be directly attributed to the overexpressed fl-cDNA. Investigating a large population of FOX lines could reveal important genes conferring favorable phenotypes for crop breeding. Alternatively, a unique loss-of-function approach Chimeric REpressor gene Silencing Technology (CRES-T has been developed. In CRES-T, overexpression of a chimeric repressor, composed of the coding sequence of a transcription factor (TF and short peptide designated as the repression domain, could interfere with the action of endogenous TF in plants. Although plant TFs usually consist of gene families, CRES-T is effective, in principle, even for the TFs with functional redundancy. In this review, we focus on the current status of the gene-overexpression strategies and resources for identifying and elucidating novel functions of cereal genes. We discuss the potential of these research tools for identifying useful genes and phenotypes for application in crop

  17. Genome-wide and gene-based association studies of anxiety disorders in European and African American samples.

    Directory of Open Access Journals (Sweden)

    Takeshi Otowa

    Full Text Available Anxiety disorders (ADs are common mental disorders caused by a combination of genetic and environmental factors. Since ADs are highly comorbid with each other, partially due to shared genetic basis, studying AD phenotypes in a coordinated manner may be a powerful strategy for identifying potential genetic loci for ADs. To detect these loci, we performed genome-wide association studies (GWAS of ADs. In addition, as a complementary approach to single-locus analysis, we also conducted gene- and pathway-based analyses. GWAS data were derived from the control sample of the Molecular Genetics of Schizophrenia (MGS project (2,540 European American and 849 African American subjects genotyped on the Affymetrix GeneChip 6.0 array. We applied two phenotypic approaches: (1 categorical case-control comparisons (CC based upon psychiatric diagnoses, and (2 quantitative phenotypic factor scores (FS derived from a multivariate analysis combining information across the clinical phenotypes. Linear and logistic models were used to analyse the association with ADs using FS and CC traits, respectively. At the single locus level, no genome-wide significant association was found. A trans-population gene-based meta-analysis across both ethnic subsamples using FS identified three genes (MFAP3L on 4q32.3, NDUFAB1 and PALB2 on 16p12 with genome-wide significance (false discovery rate (FDR] <5%. At the pathway level, several terms such as transcription regulation, cytokine binding, and developmental process were significantly enriched in ADs (FDR <5%. Our approaches studying ADs as quantitative traits and utilizing the full GWAS data may be useful in identifying susceptibility genes and pathways for ADs.

  18. Developing a distributed HTML5-based search engine for geospatial resource discovery

    Science.gov (United States)

    ZHOU, N.; XIA, J.; Nebert, D.; Yang, C.; Gui, Z.; Liu, K.

    2013-12-01

    With explosive growth of data, Geospatial Cyberinfrastructure(GCI) components are developed to manage geospatial resources, such as data discovery and data publishing. However, the efficiency of geospatial resources discovery is still challenging in that: (1) existing GCIs are usually developed for users of specific domains. Users may have to visit a number of GCIs to find appropriate resources; (2) The complexity of decentralized network environment usually results in slow response and pool user experience; (3) Users who use different browsers and devices may have very different user experiences because of the diversity of front-end platforms (e.g. Silverlight, Flash or HTML). To address these issues, we developed a distributed and HTML5-based search engine. Specifically, (1)the search engine adopts a brokering approach to retrieve geospatial metadata from various and distributed GCIs; (2) the asynchronous record retrieval mode enhances the search performance and user interactivity; (3) the search engine based on HTML5 is able to provide unified access capabilities for users with different devices (e.g. tablet and smartphone).

  19. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

    Science.gov (United States)

    Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

    2018-02-01

    The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.

  20. Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data

    Directory of Open Access Journals (Sweden)

    Gruber Stephen B

    2005-02-01

    Full Text Available Abstract Background A critical step in processing oligonucleotide microarray data is combining the information in multiple probes to produce a single number that best captures the expression level of a RNA transcript. Several systematic studies comparing multiple methods for array processing have used tightly controlled calibration data sets as the basis for comparison. Here we compare performances for seven processing methods using two data sets originally collected for disease profiling studies. An emphasis is placed on understanding sensitivity for detecting differentially expressed genes in terms of two key statistical determinants: test statistic variability for non-differentially expressed genes, and test statistic size for truly differentially expressed genes. Results In the two data sets considered here, up to seven-fold variation across the processing methods was found in the number of genes detected at a given false discovery rate (FDR. The best performing methods called up to 90% of the same genes differentially expressed, had less variable test statistics under randomization, and had a greater number of large test statistics in the experimental data. Poor performance of one method was directly tied to a tendency to produce highly variable test statistic values under randomization. Based on an overall measure of performance, two of the seven methods (Dchip and a trimmed mean approach are superior in the two data sets considered here. Two other methods (MAS5 and GCRMA-EB are inferior, while results for the other three methods are mixed. Conclusions Choice of processing method has a major impact on differential expression analysis of microarray data. Previously reported performance analyses using tightly controlled calibration data sets are not highly consistent with results reported here using data from human tissue samples. Performance of array processing methods in disease profiling and other realistic biological studies should be

  1. Finding potentially new multimorbidity patterns of psychiatric and somatic diseases: exploring the use of literature-based discovery in primary care research.

    Science.gov (United States)

    Vos, Rein; Aarts, Sil; van Mulligen, Erik; Metsemakers, Job; van Boxtel, Martin P; Verhey, Frans; van den Akker, Marjan

    2014-01-01

    Multimorbidity, the co-occurrence of two or more chronic medical conditions within a single individual, is increasingly becoming part of daily care of general medical practice. Literature-based discovery may help to investigate the patterns of multimorbidity and to integrate medical knowledge for improving healthcare delivery for individuals with co-occurring chronic conditions. To explore the usefulness of literature-based discovery in primary care research through the key-case of finding associations between psychiatric and somatic diseases relevant to general practice in a large biomedical literature database (Medline). By using literature based discovery for matching disease profiles as vectors in a high-dimensional associative concept space, co-occurrences of a broad spectrum of chronic medical conditions were matched for their potential in biomedicine. An experimental setting was chosen in parallel with expert evaluations and expert meetings to assess performance and to generate targets for integrating literature-based discovery in multidisciplinary medical research of psychiatric and somatic disease associations. Through stepwise reductions a reference set of 21,945 disease combinations was generated, from which a set of 166 combinations between psychiatric and somatic diseases was selected and assessed by text mining and expert evaluation. Literature-based discovery tools generate specific patterns of associations between psychiatric and somatic diseases: one subset was appraised as promising for further research; the other subset surprised the experts, leading to intricate discussions and further eliciting of frameworks of biomedical knowledge. These frameworks enable us to specify targets for further developing and integrating literature-based discovery in multidisciplinary research of general practice, psychology and psychiatry, and epidemiology.

  2. A critique of the molecular target-based drug discovery paradigm based on principles of metabolic control: advantages of pathway-based discovery.

    Science.gov (United States)

    Hellerstein, Marc K

    2008-01-01

    Contemporary drug discovery and development (DDD) is dominated by a molecular target-based paradigm. Molecular targets that are potentially important in disease are physically characterized; chemical entities that interact with these targets are identified by ex vivo high-throughput screening assays, and optimized lead compounds enter testing as drugs. Contrary to highly publicized claims, the ascendance of this approach has in fact resulted in the lowest rate of new drug approvals in a generation. The primary explanation for low rates of new drugs is attrition, or the failure of candidates identified by molecular target-based methods to advance successfully through the DDD process. In this essay, I advance the thesis that this failure was predictable, based on modern principles of metabolic control that have emerged and been applied most forcefully in the field of metabolic engineering. These principles, such as the robustness of flux distributions, address connectivity relationships in complex metabolic networks and make it unlikely a priori that modulating most molecular targets will have predictable, beneficial functional outcomes. These same principles also suggest, however, that unexpected therapeutic actions will be common for agents that have any effect (i.e., that complexity can be exploited therapeutically). A potential operational solution (pathway-based DDD), based on observability rather than predictability, is described, focusing on emergent properties of key metabolic pathways in vivo. Recent examples of pathway-based DDD are described. In summary, the molecular target-based DDD paradigm is built on a naïve and misleading model of biologic control and is not heuristically adequate for advancing the mission of modern therapeutics. New approaches that take account of and are built on principles described by metabolic engineers are needed for the next generation of DDD.

  3. Fragment-Based Discovery of Pyrimido[1,2-b]indazole PDE10A Inhibitors.

    Science.gov (United States)

    Chino, Ayaka; Seo, Ryushi; Amano, Yasushi; Namatame, Ichiji; Hamaguchi, Wataru; Honbou, Kazuya; Mihara, Takuma; Yamazaki, Mayako; Tomishima, Masaki; Masuda, Naoyuki

    2018-01-01

    In this study, we report the identification of potent pyrimidoindazoles as phosphodiesterase10A (PDE10A) inhibitors by using the method of fragment-based drug discovery (FBDD). The pyrazolopyridine derivative 2 was found to be a fragment hit compound which could occupy a part of the binding site of PDE10A enzyme by using the method of the X-ray co-crystal structure analysis. On the basis of the crystal structure of compound 2 and PDE10A protein, a number of compounds were synthesized and evaluated, by means of structure-activity relationship (SAR) studies, which culminated in the discovery of a novel pyrimidoindazole derivative 13 having good physicochemical properties.

  4. Representation Discovery using Harmonic Analysis

    CERN Document Server

    Mahadevan, Sridhar

    2008-01-01

    Representations are at the heart of artificial intelligence (AI). This book is devoted to the problem of representation discovery: how can an intelligent system construct representations from its experience? Representation discovery re-parameterizes the state space - prior to the application of information retrieval, machine learning, or optimization techniques - facilitating later inference processes by constructing new task-specific bases adapted to the state space geometry. This book presents a general approach to representation discovery using the framework of harmonic analysis, in particu

  5. Array based Discovery of Aptamer Pairs (Open Access Publisher’s Version)

    Science.gov (United States)

    2014-12-11

    Array-based Discovery of Aptamer Pairs Minseon Cho,†,‡ Seung Soo Oh,‡ Jeff Nie,§ Ron Stewart,§ Monte J. Radeke,⊥ Michael Eisenstein ,†,‡ Peter J...ac504076k | Anal. Chem. 2015, 87, 821−828827 (24) Cho, M.; Oh, S. S.; Nie, J.; Stewart, R.; Eisenstein , M.; Chambers, J.; Marth, J. D.; Walker, F

  6. The Matchmaker Exchange: a platform for rare disease gene discovery

    NARCIS (Netherlands)

    Philippakis, A.A.; Azzariti, D.R.; Beltran, S.; Brookes, A.J.; Brownstein, C.A.; Brudno, M.; Brunner, H.G.; Buske, O.J.; Carey, K.; Doll, C.; Dumitriu, S.; Dyke, S.O.M.; Dunnen, J.T. den; Firth, H.V.; Gibbs, R.A.; Girdea, M.; Gonzalez, M.; Haendel, M.A.; Hamosh, A.; Holm, I.A.; Huang, L.; Hurles, M.E.; Hutton, B.; Krier, J.B.; Misyura, A.; Mungall, C.J.; Paschall, J.; Paten, B.; Robinson, P.N.; Schiettecatte, F.; Sobreira, N.L.; Swaminathan, G.J.; Taschner, P.E.M.; Terry, S.F.; Washington, N.L.; Zuchner, S.; Boycott, K.M.; Rehm, H.L.

    2015-01-01

    There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of

  7. Cracking the regulatory code of biosynthetic gene clusters as a strategy for natural product discovery.

    Science.gov (United States)

    Rigali, Sébastien; Anderssen, Sinaeda; Naômé, Aymeric; van Wezel, Gilles P

    2018-01-05

    The World Health Organization (WHO) describes antibiotic resistance as "one of the biggest threats to global health, food security, and development today", as the number of multi- and pan-resistant bacteria is rising dangerously. Acquired resistance phenomena also impair antifungals, antivirals, anti-cancer drug therapy, while herbicide resistance in weeds threatens the crop industry. On the positive side, it is likely that the chemical space of natural products goes far beyond what has currently been discovered. This idea is fueled by genome sequencing of microorganisms which unveiled numerous so-called cryptic biosynthetic gene clusters (BGCs), many of which are transcriptionally silent under laboratory culture conditions, and by the fact that most bacteria cannot yet be cultivated in the laboratory. However, brute force antibiotic discovery does not yield the same results as it did in the past, and researchers have had to develop creative strategies in order to unravel the hidden potential of microorganisms such as Streptomyces and other antibiotic-producing microorganisms. Identifying the cis elements and their corresponding transcription factors(s) involved in the control of BGCs through bioinformatic approaches is a promising strategy. Theoretically, we are a few 'clicks' away from unveiling the culturing conditions or genetic changes needed to activate the production of cryptic metabolites or increase the production yield of known compounds to make them economically viable. In this opinion article, we describe and illustrate the idea beyond 'cracking' the regulatory code for natural product discovery, by presenting a series of proofs of concept, and discuss what still should be achieved to increase the rate of success of this strategy. Copyright © 2018 Elsevier Inc. All rights reserved.

  8. The impact of genetics on future drug discovery in schizophrenia.

    Science.gov (United States)

    Matsumoto, Mitsuyuki; Walton, Noah M; Yamada, Hiroshi; Kondo, Yuji; Marek, Gerard J; Tajinda, Katsunori

    2017-07-01

    Failures of investigational new drugs (INDs) for schizophrenia have left huge unmet medical needs for patients. Given the recent lackluster results, it is imperative that new drug discovery approaches (and resultant drug candidates) target pathophysiological alterations that are shared in specific, stratified patient populations that are selected based on pre-identified biological signatures. One path to implementing this paradigm is achievable by leveraging recent advances in genetic information and technologies. Genome-wide exome sequencing and meta-analysis of single nucleotide polymorphism (SNP)-based association studies have already revealed rare deleterious variants and SNPs in patient populations. Areas covered: Herein, the authors review the impact that genetics have on the future of schizophrenia drug discovery. The high polygenicity of schizophrenia strongly indicates that this disease is biologically heterogeneous so the identification of unique subgroups (by patient stratification) is becoming increasingly necessary for future investigational new drugs. Expert opinion: The authors propose a pathophysiology-based stratification of genetically-defined subgroups that share deficits in particular biological pathways. Existing tools, including lower-cost genomic sequencing and advanced gene-editing technology render this strategy ever more feasible. Genetically complex psychiatric disorders such as schizophrenia may also benefit from synergistic research with simpler monogenic disorders that share perturbations in similar biological pathways.

  9. Fragment Based Strategies for Discovery of Novel HIV-1 Reverse Transcriptase and Integrase Inhibitors.

    Science.gov (United States)

    Latham, Catherine F; La, Jennifer; Tinetti, Ricky N; Chalmers, David K; Tachedjian, Gilda

    2016-01-01

    Human immunodeficiency virus (HIV) remains a global health problem. While combined antiretroviral therapy has been successful in controlling the virus in patients, HIV can develop resistance to drugs used for treatment, rendering available drugs less effective and limiting treatment options. Initiatives to find novel drugs for HIV treatment are ongoing, although traditional drug design approaches often focus on known binding sites for inhibition of established drug targets like reverse transcriptase and integrase. These approaches tend towards generating more inhibitors in the same drug classes already used in the clinic. Lack of diversity in antiretroviral drug classes can result in limited treatment options, as cross-resistance can emerge to a whole drug class in patients treated with only one drug from that class. A fresh approach in the search for new HIV-1 drugs is fragment-based drug discovery (FBDD), a validated strategy for drug discovery based on using smaller libraries of low molecular weight molecules (FBDD is aimed at not only finding novel drug scaffolds, but also probing the target protein to find new, often allosteric, inhibitory binding sites. Several fragment-based strategies have been successful in identifying novel inhibitory sites or scaffolds for two proven drug targets for HIV-1, reverse transcriptase and integrase. While any FBDD-generated HIV-1 drugs have yet to enter the clinic, recent FBDD initiatives against these two well-characterised HIV-1 targets have reinvigorated antiretroviral drug discovery and the search for novel classes of HIV-1 drugs.

  10. Integration of lyoplate based flow cytometry and computational analysis for standardized immunological biomarker discovery.

    Directory of Open Access Journals (Sweden)

    Federica Villanova

    Full Text Available Discovery of novel immune biomarkers for monitoring of disease prognosis and response to therapy in immune-mediated inflammatory diseases is an important unmet clinical need. Here, we establish a novel framework for immunological biomarker discovery, comparing a conventional (liquid flow cytometry platform (CFP and a unique lyoplate-based flow cytometry platform (LFP in combination with advanced computational data analysis. We demonstrate that LFP had higher sensitivity compared to CFP, with increased detection of cytokines (IFN-γ and IL-10 and activation markers (Foxp3 and CD25. Fluorescent intensity of cells stained with lyophilized antibodies was increased compared to cells stained with liquid antibodies. LFP, using a plate loader, allowed medium-throughput processing of samples with comparable intra- and inter-assay variability between platforms. Automated computational analysis identified novel immunophenotypes that were not detected with manual analysis. Our results establish a new flow cytometry platform for standardized and rapid immunological biomarker discovery with wide application to immune-mediated diseases.

  11. Integration of lyoplate based flow cytometry and computational analysis for standardized immunological biomarker discovery.

    Science.gov (United States)

    Villanova, Federica; Di Meglio, Paola; Inokuma, Margaret; Aghaeepour, Nima; Perucha, Esperanza; Mollon, Jennifer; Nomura, Laurel; Hernandez-Fuentes, Maria; Cope, Andrew; Prevost, A Toby; Heck, Susanne; Maino, Vernon; Lord, Graham; Brinkman, Ryan R; Nestle, Frank O

    2013-01-01

    Discovery of novel immune biomarkers for monitoring of disease prognosis and response to therapy in immune-mediated inflammatory diseases is an important unmet clinical need. Here, we establish a novel framework for immunological biomarker discovery, comparing a conventional (liquid) flow cytometry platform (CFP) and a unique lyoplate-based flow cytometry platform (LFP) in combination with advanced computational data analysis. We demonstrate that LFP had higher sensitivity compared to CFP, with increased detection of cytokines (IFN-γ and IL-10) and activation markers (Foxp3 and CD25). Fluorescent intensity of cells stained with lyophilized antibodies was increased compared to cells stained with liquid antibodies. LFP, using a plate loader, allowed medium-throughput processing of samples with comparable intra- and inter-assay variability between platforms. Automated computational analysis identified novel immunophenotypes that were not detected with manual analysis. Our results establish a new flow cytometry platform for standardized and rapid immunological biomarker discovery with wide application to immune-mediated diseases.

  12. Cancer in silico drug discovery: a systems biology tool for identifying candidate drugs to target specific molecular tumor subtypes.

    Science.gov (United States)

    San Lucas, F Anthony; Fowler, Jerry; Chang, Kyle; Kopetz, Scott; Vilar, Eduardo; Scheet, Paul

    2014-12-01

    Large-scale cancer datasets such as The Cancer Genome Atlas (TCGA) allow researchers to profile tumors based on a wide range of clinical and molecular characteristics. Subsequently, TCGA-derived gene expression profiles can be analyzed with the Connectivity Map (CMap) to find candidate drugs to target tumors with specific clinical phenotypes or molecular characteristics. This represents a powerful computational approach for candidate drug identification, but due to the complexity of TCGA and technology differences between CMap and TCGA experiments, such analyses are challenging to conduct and reproduce. We present Cancer in silico Drug Discovery (CiDD; scheet.org/software), a computational drug discovery platform that addresses these challenges. CiDD integrates data from TCGA, CMap, and Cancer Cell Line Encyclopedia (CCLE) to perform computational drug discovery experiments, generating hypotheses for the following three general problems: (i) determining whether specific clinical phenotypes or molecular characteristics are associated with unique gene expression signatures; (ii) finding candidate drugs to repress these expression signatures; and (iii) identifying cell lines that resemble the tumors being studied for subsequent in vitro experiments. The primary input to CiDD is a clinical or molecular characteristic. The output is a biologically annotated list of candidate drugs and a list of cell lines for in vitro experimentation. We applied CiDD to identify candidate drugs to treat colorectal cancers harboring mutations in BRAF. CiDD identified EGFR and proteasome inhibitors, while proposing five cell lines for in vitro testing. CiDD facilitates phenotype-driven, systematic drug discovery based on clinical and molecular data from TCGA. ©2014 American Association for Cancer Research.

  13. Sex-specific associations between particulate matter exposure and gene expression in independent discovery and validation cohorts of middle-aged men and women

    DEFF Research Database (Denmark)

    Vrijens, Karen; Winckelmans, Ellen; Tsamou, Maria

    2017-01-01

    Background: Particulate matter (PM) exposure leads to premature death, mainly due to respiratory and cardiovascular diseases. Objectives: Identification of transcriptomic biomarkers of air pollution exposure and effect in a healthy adult population. Methods: Microarray analyses were performed in 98...... healthy volunteers (48 men, 50 women). The expression of eight sex-specific candidate biomarker genes (significantly associated with PM10 in the discovery cohort and with a reported link to air pollution-related disease) was measured with qPCR in an independent validation cohort (75 men, 94 women...

  14. dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface

    Science.gov (United States)

    Rot, Gregor; Parikh, Anup; Curk, Tomaz; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaz

    2009-01-01

    Background Bioinformatics often leverages on recent advancements in computer science to support biologists in their scientific discovery process. Such efforts include the development of easy-to-use web interfaces to biomedical databases. Recent advancements in interactive web technologies require us to rethink the standard submit-and-wait paradigm, and craft bioinformatics web applications that share analytical and interactive power with their desktop relatives, while retaining simplicity and availability. Results We have developed dictyExpress, a web application that features a graphical, highly interactive explorative interface to our database that consists of more than 1000 Dictyostelium discoideum gene expression experiments. In dictyExpress, the user can select experiments and genes, perform gene clustering, view gene expression profiles across time, view gene co-expression networks, perform analyses of Gene Ontology term enrichment, and simultaneously display expression profiles for a selected gene in various experiments. Most importantly, these tasks are achieved through web applications whose components are seamlessly interlinked and immediately respond to events triggered by the user, thus providing a powerful explorative data analysis environment. Conclusion dictyExpress is a precursor for a new generation of web-based bioinformatics applications with simple but powerful interactive interfaces that resemble that of the modern desktop. While dictyExpress serves mainly the Dictyostelium research community, it is relatively easy to adapt it to other datasets. We propose that the design ideas behind dictyExpress will influence the development of similar applications for other model organisms. PMID:19706156

  15. The current state of GPCR-based drug discovery to treat metabolic disease.

    Science.gov (United States)

    Sloop, Kyle W; Emmerson, Paul J; Statnick, Michael A; Willard, Francis S

    2018-02-02

    One approach of modern drug discovery is to identify agents that enhance or diminish signal transduction cascades in various cell types and tissues by modulating the activity of GPCRs. This strategy has resulted in the development of new medicines to treat many conditions, including cardiovascular disease, psychiatric disorders, HIV/AIDS, certain forms of cancer and Type 2 diabetes mellitus (T2DM). These successes justify further pursuit of GPCRs as disease targets and provide key learning that should help guide identifying future therapeutic agents. This report reviews the current landscape of GPCR drug discovery with emphasis on efforts aimed at developing new molecules for treating T2DM and obesity. We analyse historical efforts to generate GPCR-based drugs to treat metabolic disease in terms of causal factors leading to success and failure in this endeavour. © 2018 The British Pharmacological Society.

  16. SNP discovery in the bovine milk transcriptome using RNA-Seq technology.

    Science.gov (United States)

    Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F

    2010-12-01

    High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.

  17. Computational method for discovery of estrogen responsive genes

    DEFF Research Database (Denmark)

    Tang, Suisheng; Tan, Sin Lam; Ramadoss, Suresh Kumar

    2004-01-01

    Estrogen has a profound impact on human physiology and affects numerous genes. The classical estrogen reaction is mediated by its receptors (ERs), which bind to the estrogen response elements (EREs) in target gene's promoter region. Due to tedious and expensive experiments, a limited number of hu...

  18. Search for 5'-leader regulatory RNA structures based on gene annotation aided by the RiboGap database.

    Science.gov (United States)

    Naghdi, Mohammad Reza; Smail, Katia; Wang, Joy X; Wade, Fallou; Breaker, Ronald R; Perreault, Jonathan

    2017-03-15

    The discovery of noncoding RNAs (ncRNAs) and their importance for gene regulation led us to develop bioinformatics tools to pursue the discovery of novel ncRNAs. Finding ncRNAs de novo is challenging, first due to the difficulty of retrieving large numbers of sequences for given gene activities, and second due to exponential demands on calculation needed for comparative genomics on a large scale. Recently, several tools for the prediction of conserved RNA secondary structure were developed, but many of them are not designed to uncover new ncRNAs, or are too slow for conducting analyses on a large scale. Here we present various approaches using the database RiboGap as a primary tool for finding known ncRNAs and for uncovering simple sequence motifs with regulatory roles. This database also can be used to easily extract intergenic sequences of eubacteria and archaea to find conserved RNA structures upstream of given genes. We also show how to extend analysis further to choose the best candidate ncRNAs for experimental validation. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection

    KAUST Repository

    Dhavala, Soma S.

    2010-09-01

    Massively Parallel Signature Sequencing (MPSS) is a high-throughput, counting-based technology available for gene expression profiling. It produces output that is similar to Serial Analysis of Gene Expression and is ideal for building complex relational databases for gene expression. Our goal is to compare the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero-inflatedPoisson distribution, different from existing methods that assume continuous densities. We adopt two Bayesian hierarchical models-one parametric and the other semiparametric with a Dirichlet process prior that has the ability to "borrow strength" across related signatures, where a signature is a specific arrangement of the nucleotides, usually 16-21 base pairs long. We utilize the discreteness of Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using nonparametric approaches, while controlling the false discovery rate. We identify several differentially expressed genes that have important biological significance and conclude with a summary of the biological discoveries. This article has supplementary materials online. © 2010 American Statistical Association.

  20. Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection

    KAUST Repository

    Dhavala, Soma S.; Datta, Sujay; Mallick, Bani K.; Carroll, Raymond J.; Khare, Sangeeta; Lawhon, Sara D.; Adams, L. Garry

    2010-01-01

    Massively Parallel Signature Sequencing (MPSS) is a high-throughput, counting-based technology available for gene expression profiling. It produces output that is similar to Serial Analysis of Gene Expression and is ideal for building complex relational databases for gene expression. Our goal is to compare the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero-inflatedPoisson distribution, different from existing methods that assume continuous densities. We adopt two Bayesian hierarchical models-one parametric and the other semiparametric with a Dirichlet process prior that has the ability to "borrow strength" across related signatures, where a signature is a specific arrangement of the nucleotides, usually 16-21 base pairs long. We utilize the discreteness of Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using nonparametric approaches, while controlling the false discovery rate. We identify several differentially expressed genes that have important biological significance and conclude with a summary of the biological discoveries. This article has supplementary materials online. © 2010 American Statistical Association.

  1. Recent development of computational resources for new antibiotics discovery

    DEFF Research Database (Denmark)

    Kim, Hyun Uk; Blin, Kai; Lee, Sang Yup

    2017-01-01

    Understanding a complex working mechanism of biosynthetic gene clusters (BGCs) encoding secondary metabolites is a key to discovery of new antibiotics. Computational resources continue to be developed in order to better process increasing volumes of genome and chemistry data, and thereby better...

  2. SpirPep: an in silico digestion-based platform to assist bioactive peptides discovery from a genome-wide database.

    Science.gov (United States)

    Anekthanakul, Krittima; Hongsthong, Apiradee; Senachak, Jittisak; Ruengjitchatchawalya, Marasri

    2018-04-20

    Bioactive peptides, including biological sources-derived peptides with different biological activities, are protein fragments that influence the functions or conditions of organisms, in particular humans and animals. Conventional methods of identifying bioactive peptides are time-consuming and costly. To quicken the processes, several bioinformatics tools are recently used to facilitate screening of the potential peptides prior their activity assessment in vitro and/or in vivo. In this study, we developed an efficient computational method, SpirPep, which offers many advantages over the currently available tools. The SpirPep web application tool is a one-stop analysis and visualization facility to assist bioactive peptide discovery. The tool is equipped with 15 customized enzymes and 1-3 miscleavage options, which allows in silico digestion of protein sequences encoded by protein-coding genes from single, multiple, or genome-wide scaling, and then directly classifies the peptides by bioactivity using an in-house database that contains bioactive peptides collected from 13 public databases. With this tool, the resulting peptides are categorized by each selected enzyme, and shown in a tabular format where the peptide sequences can be tracked back to their original proteins. The developed tool and webpages are coded in PHP and HTML with CSS/JavaScript. Moreover, the tool allows protein-peptide alignment visualization by Generic Genome Browser (GBrowse) to display the region and details of the proteins and peptides within each parameter, while considering digestion design for the desirable bioactivity. SpirPep is efficient; it takes less than 20 min to digest 3000 proteins (751,860 amino acids) with 15 enzymes and three miscleavages for each enzyme, and only a few seconds for single enzyme digestion. Obviously, the tool identified more bioactive peptides than that of the benchmarked tool; an example of validated pentapeptide (FLPIL) from LC-MS/MS was demonstrated. The

  3. A Metadata based Knowledge Discovery Methodology for Seeding Translational Research.

    Science.gov (United States)

    Kothari, Cartik R; Payne, Philip R O

    2015-01-01

    In this paper, we present a semantic, metadata based knowledge discovery methodology for identifying teams of researchers from diverse backgrounds who can collaborate on interdisciplinary research projects: projects in areas that have been identified as high-impact areas at The Ohio State University. This methodology involves the semantic annotation of keywords and the postulation of semantic metrics to improve the efficiency of the path exploration algorithm as well as to rank the results. Results indicate that our methodology can discover groups of experts from diverse areas who can collaborate on translational research projects.

  4. Novel Technology for Protein-Protein Interaction-based Targeted Drug Discovery

    Directory of Open Access Journals (Sweden)

    Jung Me Hwang

    2011-12-01

    Full Text Available We have developed a simple but highly efficient in-cell protein-protein interaction (PPI discovery system based on the translocation properties of protein kinase C- and its C1a domain in live cells. This system allows the visual detection of trimeric and dimeric protein interactions including cytosolic, nuclear, and/or membrane proteins with their cognate ligands. In addition, this system can be used to identify pharmacological small compounds that inhibit specific PPIs. These properties make this PPI system an attractive tool for screening drug candidates and mapping the protein interactome.

  5. Discovery radiomics via evolutionary deep radiomic sequencer discovery for pathologically proven lung cancer detection.

    Science.gov (United States)

    Shafiee, Mohammad Javad; Chung, Audrey G; Khalvati, Farzad; Haider, Masoom A; Wong, Alexander

    2017-10-01

    While lung cancer is the second most diagnosed form of cancer in men and women, a sufficiently early diagnosis can be pivotal in patient survival rates. Imaging-based, or radiomics-driven, detection methods have been developed to aid diagnosticians, but largely rely on hand-crafted features that may not fully encapsulate the differences between cancerous and healthy tissue. Recently, the concept of discovery radiomics was introduced, where custom abstract features are discovered from readily available imaging data. We propose an evolutionary deep radiomic sequencer discovery approach based on evolutionary deep intelligence. Motivated by patient privacy concerns and the idea of operational artificial intelligence, the evolutionary deep radiomic sequencer discovery approach organically evolves increasingly more efficient deep radiomic sequencers that produce significantly more compact yet similarly descriptive radiomic sequences over multiple generations. As a result, this framework improves operational efficiency and enables diagnosis to be run locally at the radiologist's computer while maintaining detection accuracy. We evaluated the evolved deep radiomic sequencer (EDRS) discovered via the proposed evolutionary deep radiomic sequencer discovery framework against state-of-the-art radiomics-driven and discovery radiomics methods using clinical lung CT data with pathologically proven diagnostic data from the LIDC-IDRI dataset. The EDRS shows improved sensitivity (93.42%), specificity (82.39%), and diagnostic accuracy (88.78%) relative to previous radiomics approaches.

  6. Tools to covisualize and coanalyze proteomic data with genomes and transcriptomes: validation of genes and alternative mRNA splicing

    DEFF Research Database (Denmark)

    Pang, Chi; Tay, Aidan; Aya, Carlos

    2014-01-01

    contigs, along with RNA-seq reads. This is done in the Integrated Genome Viewer (IGV). A Results Analyzer reports the precise base position where LC-MS/MS-derived peptides cover genes or gene isoforms, on the chromosomes or contigs where this occurs. In prokaryotes, the PG Nexus pipeline facilitates...... the validation of genes, where annotation or gene prediction is available, or the discovery of genes using a "virtual protein"-based unbiased approach. We illustrate this with a comprehensive proteogenomics analysis of two strains of Campylobacter concisus . For higher eukaryotes, the PG Nexus facilitates gene...

  7. Performance Evaluation of a Cluster-Based Service Discovery Protocol for Heterogeneous Wireless Sensor Networks

    NARCIS (Netherlands)

    Marin Perianu, Raluca; Scholten, Johan; Havinga, Paul J.M.; Hartel, Pieter H.

    2006-01-01

    Abstract—This paper evaluates the performance in terms of resource consumption of a service discovery protocol proposed for heterogeneous Wireless Sensor Networks (WSNs). The protocol is based on a clustering structure, which facilitates the construction of a distributed directory. Nodes with higher

  8. When fragments link : a bibliometric perspective on the development of fragment-based drug discovery

    NARCIS (Netherlands)

    Romasanta, A.K.S.; van der Sijde, P.C.; Hellsten, I.; Hubbard, Roderick E.; Keseru, Gyorgy M.; van Muijlwijk-Koezen, Jacqueline E.; de Esch, I.J.P.

    2018-01-01

    Fragment-based drug discovery (FBDD) is a highly interdisciplinary field, rich in ideas integrated from pharmaceutical sciences, chemistry, biology, and physics, among others. To enrich our understanding of the development of the field, we used bibliometric techniques to analyze 3642 publications in

  9. Infrared and Raman Spectroscopy: A Discovery-Based Activity for the General Chemistry Curriculum

    Science.gov (United States)

    Borgsmiller, Karen L.; O'Connell, Dylan J.; Klauenberg, Kathryn M.; Wilson, Peter M.; Stromberg, Christopher J.

    2012-01-01

    A discovery-based method is described for incorporating the concepts of IR and Raman spectroscopy into the general chemistry curriculum. Students use three sets of springs to model the properties of single, double, and triple covalent bonds. Then, Gaussian 03W molecular modeling software is used to illustrate the relationship between bond…

  10. Research on Hotspot Discovery in Internet Public Opinions Based on Improved -Means

    Directory of Open Access Journals (Sweden)

    Gensheng Wang

    2013-01-01

    Full Text Available How to discover hotspot in the Internet public opinions effectively is a hot research field for the researchers related which plays a key role for governments and corporations to find useful information from mass data in the Internet. An improved -means algorithm for hotspot discovery in internet public opinions is presented based on the analysis of existing defects and calculation principle of original -means algorithm. First, some new methods are designed to preprocess website texts, select and express the characteristics of website texts, and define the similarity between two website texts, respectively. Second, clustering principle and the method of initial classification centers selection are analyzed and improved in order to overcome the limitations of original -means algorithm. Finally, the experimental results verify that the improved algorithm can improve the clustering stability and classification accuracy of hotspot discovery in internet public opinions when used in practice.

  11. A new approach to the rationale discovery of polymeric biomaterials

    Science.gov (United States)

    Kohn, Joachim; Welsh, William J.; Knight, Doyle

    2007-01-01

    This paper attempts to illustrate both the need for new approaches to biomaterials discovery as well as the significant promise inherent in the use of combinatorial and computational design strategies. The key observation of this Leading Opinion Paper is that the biomaterials community has been slow to embrace advanced biomaterials discovery tools such as combinatorial methods, high throughput experimentation, and computational modeling in spite of the significant promise shown by these discovery tools in materials science, medicinal chemistry and the pharmaceutical industry. It seems that the complexity of living cells and their interactions with biomaterials has been a conceptual as well as a practical barrier to the use of advanced discovery tools in biomaterials science. However, with the continued increase in computer power, the goal of predicting the biological response of cells in contact with biomaterials surfaces is within reach. Once combinatorial synthesis, high throughput experimentation, and computational modeling are integrated into the biomaterials discovery process, a significant acceleration is possible in the pace of development of improved medical implants, tissue regeneration scaffolds, and gene/drug delivery systems. PMID:17644176

  12. Gene discovery for the carcinogenic human liver fluke, Opisthorchis viverrini

    Directory of Open Access Journals (Sweden)

    Gasser Robin B

    2007-06-01

    Full Text Available Abstract Background Cholangiocarcinoma (CCA – cancer of the bile ducts – is associated with chronic infection with the liver fluke, Opisthorchis viverrini. Despite being the only eukaryote that is designated as a 'class I carcinogen' by the International Agency for Research on Cancer, little is known about its genome. Results Approximately 5,000 randomly selected cDNAs from the adult stage of O. viverrini were characterized and accounted for 1,932 contigs, representing ~14% of the entire transcriptome, and, presently, the largest sequence dataset for any species of liver fluke. Twenty percent of contigs were assigned GO classifications. Abundantly represented protein families included those involved in physiological functions that are essential to parasitism, such as anaerobic respiration, reproduction, detoxification, surface maintenance and feeding. GO assignments were well conserved in relation to other parasitic flukes, however, some categories were over-represented in O. viverrini, such as structural and motor proteins. An assessment of evolutionary relationships showed that O. viverrini was more similar to other parasitic (Clonorchis sinensis and Schistosoma japonicum than to free-living (Schmidtea mediterranea flatworms, and 105 sequences had close homologues in both parasitic species but not in S. mediterranea. A total of 164 O. viverrini contigs contained ORFs with signal sequences, many of which were platyhelminth-specific. Examples of convergent evolution between host and parasite secreted/membrane proteins were identified as were homologues of vaccine antigens from other helminths. Finally, ORFs representing secreted proteins with known roles in tumorigenesis were identified, and these might play roles in the pathogenesis of O. viverrini-induced CCA. Conclusion This gene discovery effort for O. viverrini should expedite molecular studies of cholangiocarcinogenesis and accelerate research focused on developing new interventions

  13. A history of the DNA repair and mutagenesis field: The discovery of base excision repair.

    Science.gov (United States)

    Friedberg, Errol C

    2016-01-01

    This article reviews the early history of the discovery of an DNA repair pathway designated as base excision repair (BER), since in contrast to the enzyme-catalyzed removal of damaged bases from DNA as nucleotides [called nucleotide excision repair (NER)], BER involves the removal of damaged or inappropriate bases, such as the presence of uracil instead of thymine, from DNA as free bases. Copyright © 2015. Published by Elsevier B.V.

  14. Scuba: scalable kernel-based gene prioritization.

    Science.gov (United States)

    Zampieri, Guido; Tran, Dinh Van; Donini, Michele; Navarin, Nicolò; Aiolli, Fabio; Sperduti, Alessandro; Valle, Giorgio

    2018-01-25

    The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .

  15. Harvest: an open platform for developing web-based biomedical data discovery and reporting applications.

    Science.gov (United States)

    Pennington, Jeffrey W; Ruth, Byron; Italia, Michael J; Miller, Jeffrey; Wrazien, Stacey; Loutrel, Jennifer G; Crenshaw, E Bryan; White, Peter S

    2014-01-01

    Biomedical researchers share a common challenge of making complex data understandable and accessible as they seek inherent relationships between attributes in disparate data types. Data discovery in this context is limited by a lack of query systems that efficiently show relationships between individual variables, but without the need to navigate underlying data models. We have addressed this need by developing Harvest, an open-source framework of modular components, and using it for the rapid development and deployment of custom data discovery software applications. Harvest incorporates visualizations of highly dimensional data in a web-based interface that promotes rapid exploration and export of any type of biomedical information, without exposing researchers to underlying data models. We evaluated Harvest with two cases: clinical data from pediatric cardiology and demonstration data from the OpenMRS project. Harvest's architecture and public open-source code offer a set of rapid application development tools to build data discovery applications for domain-specific biomedical data repositories. All resources, including the OpenMRS demonstration, can be found at http://harvest.research.chop.edu.

  16. Price discovery in a continuous-time setting

    DEFF Research Database (Denmark)

    Dias, Gustavo Fruet; Fernandes, Marcelo; Scherrer, Cristina

    We formulate a continuous-time price discovery model in which the price discovery measure varies (stochastically) at daily frequency. We estimate daily measures of price discovery using a kernel-based OLS estimator instead of running separate daily VECM regressions as standard in the literature. We...... show that our estimator is not only consistent, but also outperforms the standard daily VECM in finite samples. We illustrate our theoretical findings by studying the price discovery process of 10 actively traded stocks in the U.S. from 2007 to 2013....

  17. Antibody informatics for drug discovery

    DEFF Research Database (Denmark)

    Shirai, Hiroki; Prades, Catherine; Vita, Randi

    2014-01-01

    to the antibody science in every project in antibody drug discovery. Recent experimental technologies allow for the rapid generation of large-scale data on antibody sequences, affinity, potency, structures, and biological functions; this should accelerate drug discovery research. Therefore, a robust bioinformatic...... infrastructure for these large data sets has become necessary. In this article, we first identify and discuss the typical obstacles faced during the antibody drug discovery process. We then summarize the current status of three sub-fields of antibody informatics as follows: (i) recent progress in technologies...... for antibody rational design using computational approaches to affinity and stability improvement, as well as ab-initio and homology-based antibody modeling; (ii) resources for antibody sequences, structures, and immune epitopes and open drug discovery resources for development of antibody drugs; and (iii...

  18. Discovery and early development of AVI-7537 and AVI-7288 for the treatment of Ebola virus and Marburg virus infections.

    Science.gov (United States)

    Iversen, Patrick L; Warren, Travis K; Wells, Jay B; Garza, Nicole L; Mourich, Dan V; Welch, Lisa S; Panchal, Rekha G; Bavari, Sina

    2012-11-06

    There are no currently approved treatments for filovirus infections. In this study we report the discovery process which led to the development of antisense Phosphorodiamidate Morpholino Oligomers (PMOs) AVI-6002 (composed of AVI-7357 and AVI-7539) and AVI-6003 (composed of AVI-7287 and AVI-7288) targeting Ebola virus and Marburg virus respectively. The discovery process involved identification of optimal transcript binding sites for PMO based RNA-therapeutics followed by screening for effective viral gene target in mouse and guinea pig models utilizing adapted viral isolates. An evolution of chemical modifications were tested, beginning with simple Phosphorodiamidate Morpholino Oligomers (PMO) transitioning to cell penetrating peptide conjugated PMOs (PPMO) and ending with PMOplus containing a limited number of positively charged linkages in the PMO structure. The initial lead compounds were combinations of two agents targeting separate genes. In the final analysis, a single agent for treatment of each virus was selected, AVI-7537 targeting the VP24 gene of Ebola virus and AVI-7288 targeting NP of Marburg virus, and are now progressing into late stage clinical development as the optimal therapeutic candidates.

  19. An online conserved SSR discovery through cross-species comparison

    Directory of Open Access Journals (Sweden)

    Tun-Wen Pai

    2009-02-01

    Full Text Available Tun-Wen Pai1, Chien-Ming Chen1, Meng-Chang Hsiao1, Ronshan Cheng2, Wen-Shyong Tzou3, Chin-Hua Hu31Department of Computer Science and Engineering; 2Department of Aquaculture, 3Institute of Bioscience and Biotechnology, National Taiwan Ocean University, Keelung, Taiwan, Republic of ChinaAbstract: Simple sequence repeats (SSRs play important roles in gene regulation and genome evolution. Although there exist several online resources for SSR mining, most of them only extract general SSR patterns without providing functional information. Here, an online search tool, CG-SSR (Comparative Genomics SSR discovery, has been developed for discovering potential functional SSRs from vertebrate genomes through cross-species comparison. In addition to revealing SSR candidates in conserved regions among various species, it also combines accurate coordinate and functional genomics information. CG-SSR is the first comprehensive and efficient online tool for conserved SSR discovery.Keywords: microsatellites, genome, comparative genomics, functional SSR, gene ontology, conserved region

  20. Finding gene regulatory network candidates using the gene expression knowledge base.

    Science.gov (United States)

    Venkatesan, Aravind; Tripathi, Sushil; Sanz de Galdeano, Alejandro; Blondé, Ward; Lægreid, Astrid; Mironov, Vladimir; Kuiper, Martin

    2014-12-10

    Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of 'omics' data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis. We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions. Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network.

  1. De novo Transcriptome Assembly of Common Wild Rice (Oryza rufipogon Griff.) and Discovery of Drought-Response Genes in Root Tissue Based on Transcriptomic Data.

    Science.gov (United States)

    Tian, Xin-Jie; Long, Yan; Wang, Jiao; Zhang, Jing-Wen; Wang, Yan-Yan; Li, Wei-Min; Peng, Yu-Fa; Yuan, Qian-Hua; Pei, Xin-Wu

    2015-01-01

    The perennial O. rufipogon (common wild rice), which is considered to be the ancestor of Asian cultivated rice species, contains many useful genetic resources, including drought resistance genes. However, few studies have identified the drought resistance and tissue-specific genes in common wild rice. In this study, transcriptome sequencing libraries were constructed, including drought-treated roots (DR) and control leaves (CL) and roots (CR). Using Illumina sequencing technology, we generated 16.75 million bases of high-quality sequence data for common wild rice and conducted de novo assembly and annotation of genes without prior genome information. These reads were assembled into 119,332 unigenes with an average length of 715 bp. A total of 88,813 distinct sequences (74.42% of unigenes) significantly matched known genes in the NCBI NT database. Differentially expressed gene (DEG) analysis showed that 3617 genes were up-regulated and 4171 genes were down-regulated in the CR library compared with the CL library. Among the DEGs, 535 genes were expressed in roots but not in shoots. A similar comparison between the DR and CR libraries showed that 1393 genes were up-regulated and 315 genes were down-regulated in the DR library compared with the CR library. Finally, 37 genes that were specifically expressed in roots were screened after comparing the DEGs identified in the above-described analyses. This study provides a transcriptome sequence resource for common wild rice plants and establishes a digital gene expression profile of wild rice plants under drought conditions using the assembled transcriptome data as a reference. Several tissue-specific and drought-stress-related candidate genes were identified, representing a fully characterized transcriptome and providing a valuable resource for genetic and genomic studies in plants.

  2. Discovery in a World of Mashups

    Science.gov (United States)

    King, T. A.; Ritschel, B.; Hourcle, J. A.; Moon, I. S.

    2014-12-01

    When the first digital information was stored electronically, discovery of what existed was through file names and the organization of the file system. With the advent of networks, digital information was shared on a wider scale, but discovery remained based on file and folder names. With a growing number of information sources, named based discovery quickly became ineffective. The keyword based search engine was one of the first types of a mashup in the world of Web 1.0. Embedded links from one document to another with prescribed relationships between files and the world of Web 2.0 was formed. Search engines like Google used the links to improve search results and a worldwide mashup was formed. While a vast improvement, the need for semantic (meaning rich) discovery was clear, especially for the discovery of scientific data. In response, every science discipline defined schemas to describe their type of data. Some core schemas where shared, but most schemas are custom tailored even though they share many common concepts. As with the networking of information sources, science increasingly relies on data from multiple disciplines. So there is a need to bring together multiple sources of semantically rich information. We explore how harvesting, conceptual mapping, facet based search engines, search term promotion, and style sheets can be combined to create the next generation of mashups in the emerging world of Web 3.0. We use NASA's Planetary Data System and NASA's Heliophysics Data Environment to illustrate how to create a multi-discipline mash-up.

  3. Identification of Candidate B-Lymphoma Genes by Cross-Species Gene Expression Profiling

    Science.gov (United States)

    Tompkins, Van S.; Han, Seong-Su; Olivier, Alicia; Syrbu, Sergei; Bair, Thomas; Button, Anna; Jacobus, Laura; Wang, Zebin; Lifton, Samuel; Raychaudhuri, Pradip; Morse, Herbert C.; Weiner, George; Link, Brian; Smith, Brian J.; Janz, Siegfried

    2013-01-01

    Comparative genome-wide expression profiling of malignant tumor counterparts across the human-mouse species barrier has a successful track record as a gene discovery tool in liver, breast, lung, prostate and other cancers, but has been largely neglected in studies on neoplasms of mature B-lymphocytes such as diffuse large B cell lymphoma (DLBCL) and Burkitt lymphoma (BL). We used global gene expression profiles of DLBCL-like tumors that arose spontaneously in Myc-transgenic C57BL/6 mice as a phylogenetically conserved filter for analyzing the human DLBCL transcriptome. The human and mouse lymphomas were found to have 60 concordantly deregulated genes in common, including 8 genes that Cox hazard regression analysis associated with overall survival in a published landmark dataset of DLBCL. Genetic network analysis of the 60 genes followed by biological validation studies indicate FOXM1 as a candidate DLBCL and BL gene, supporting a number of studies contending that FOXM1 is a therapeutic target in mature B cell tumors. Our findings demonstrate the value of the “mouse filter” for genomic studies of human B-lineage neoplasms for which a vast knowledge base already exists. PMID:24130802

  4. Upnp-Based Discovery And Management Of Hypervisors And Virtual Machines

    Directory of Open Access Journals (Sweden)

    Sławomir Zieliński

    2011-01-01

    Full Text Available The paper introduces a Universal Plug and Play based discovery and management toolkitthat facilitates collaboration between cloud infrastructure providers and users. The presentedtools construct a unified hierarchy of devices and their management-related services, thatrepresents the current deployment of users’ (virtual infrastructures in the provider’s (physicalinfrastructure as well as the management interfaces of respective devices. The hierarchycan be used to enhance the capabilities of the provider’s infrastructure management system.To maintain user independence, the set of management operations exposed by a particulardevice is always defined by the device owner (either the provider or user.

  5. Bayesian centroid estimation for motif discovery.

    Science.gov (United States)

    Carvalho, Luis

    2013-01-01

    Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  6. Bayesian centroid estimation for motif discovery.

    Directory of Open Access Journals (Sweden)

    Luis Carvalho

    Full Text Available Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.

  7. Arrayed antibody library technology for therapeutic biologic discovery.

    Science.gov (United States)

    Bentley, Cornelia A; Bazirgan, Omar A; Graziano, James J; Holmes, Evan M; Smider, Vaughn V

    2013-03-15

    Traditional immunization and display antibody discovery methods rely on competitive selection amongst a pool of antibodies to identify a lead. While this approach has led to many successful therapeutic antibodies, targets have been limited to proteins which are easily purified. In addition, selection driven discovery has produced a narrow range of antibody functionalities focused on high affinity antagonism. We review the current progress in developing arrayed protein libraries for screening-based, rather than selection-based, discovery. These single molecule per microtiter well libraries have been screened in multiplex formats against both purified antigens and directly against targets expressed on the cell surface. This facilitates the discovery of antibodies against therapeutically interesting targets (GPCRs, ion channels, and other multispanning membrane proteins) and epitopes that have been considered poorly accessible to conventional discovery methods. Copyright © 2013. Published by Elsevier Inc.

  8. Discovery, characterization and expression of a novel zebrafish gene, znfr, important for notochord formation.

    Science.gov (United States)

    Xu, Yan; Zou, Peng; Liu, Yao; Deng, Fengjiao

    2010-06-01

    Genes specifically expressed in the notochord may be crucial for proper notochord development. Using the digital differential display program offered by the National Center for Biotechnology Information, we identified a novel EST sequence from a zebrafish ovary library (No. XM_701450). The full-length cDNA of this transcript was cloned by performing 3' and 5'-RACE and was further confirmed by PCR and sequencing. The resulting 614 bp gene was found to encode a novel 94 amino acid protein that did not share significant homology with any other known protein. Characterization of the genomic sequence revealed that the gene spanned 4.9 kb and was composed of four exons and three introns. RT-PCR gene expression analysis revealed that our gene of interest was expressed in ovary, kidney, brain, mature oocytes and during the early stages of embryogenesis. During embryonic development, znfr mRNA was found to be expressed in the embryonic shield, chordamesoderm and the vacuolated notochord cells by in situ hybridization. Based on this information, we hypothesize that this novel gene is an important maternal factor required for zebrafish notochord formation during early embryonic development. We have thus named this gene znfr (zebrafish notochord formation related).

  9. Cross-organism learning method to discover new gene functionalities.

    Science.gov (United States)

    Domeniconi, Giacomo; Masseroli, Marco; Moro, Gianluca; Pinoli, Pietro

    2016-04-01

    Knowledge of gene and protein functions is paramount for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. Analyses for biomedical knowledge discovery greatly benefit from the availability of gene and protein functional feature descriptions expressed through controlled terminologies and ontologies, i.e., of gene and protein biomedical controlled annotations. In the last years, several databases of such annotations have become available; yet, these valuable annotations are incomplete, include errors and only some of them represent highly reliable human curated information. Computational techniques able to reliably predict new gene or protein annotations with an associated likelihood value are thus paramount. Here, we propose a novel cross-organisms learning approach to reliably predict new functionalities for the genes of an organism based on the known controlled annotations of the genes of another, evolutionarily related and better studied, organism. We leverage a new representation of the annotation discovery problem and a random perturbation of the available controlled annotations to allow the application of supervised algorithms to predict with good accuracy unknown gene annotations. Taking advantage of the numerous gene annotations available for a well-studied organism, our cross-organisms learning method creates and trains better prediction models, which can then be applied to predict new gene annotations of a target organism. We tested and compared our method with the equivalent single organism approach on different gene annotation datasets of five evolutionarily related organisms (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum). Results show both the usefulness of the perturbation method of available annotations for better prediction model training and a great improvement of the cross-organism models with respect to the single-organism ones

  10. Computational Methods Used in Hit-to-Lead and Lead Optimization Stages of Structure-Based Drug Discovery.

    Science.gov (United States)

    Heifetz, Alexander; Southey, Michelle; Morao, Inaki; Townsend-Nicholson, Andrea; Bodkin, Mike J

    2018-01-01

    GPCR modeling approaches are widely used in the hit-to-lead (H2L) and lead optimization (LO) stages of drug discovery. The aims of these modeling approaches are to predict the 3D structures of the receptor-ligand complexes, to explore the key interactions between the receptor and the ligand and to utilize these insights in the design of new molecules with improved binding, selectivity or other pharmacological properties. In this book chapter, we present a brief survey of key computational approaches integrated with hierarchical GPCR modeling protocol (HGMP) used in hit-to-lead (H2L) and in lead optimization (LO) stages of structure-based drug discovery (SBDD). We outline the differences in modeling strategies used in H2L and LO of SBDD and illustrate how these tools have been applied in three drug discovery projects.

  11. A high-density transcript linkage map with 1,845 expressed genes positioned by microarray-based Single Feature Polymorphisms (SFP) in Eucalyptus

    Science.gov (United States)

    2011-01-01

    Background Technological advances are progressively increasing the application of genomics to a wider array of economically and ecologically important species. High-density maps enriched for transcribed genes facilitate the discovery of connections between genes and phenotypes. We report the construction of a high-density linkage map of expressed genes for the heterozygous genome of Eucalyptus using Single Feature Polymorphism (SFP) markers. Results SFP discovery and mapping was achieved using pseudo-testcross screening and selective mapping to simultaneously optimize linkage mapping and microarray costs. SFP genotyping was carried out by hybridizing complementary RNA prepared from 4.5 year-old trees xylem to an SFP array containing 103,000 25-mer oligonucleotide probes representing 20,726 unigenes derived from a modest size expressed sequence tags collection. An SFP-mapping microarray with 43,777 selected candidate SFP probes representing 15,698 genes was subsequently designed and used to genotype SFPs in a larger subset of the segregating population drawn by selective mapping. A total of 1,845 genes were mapped, with 884 of them ordered with high likelihood support on a framework map anchored to 180 microsatellites with average density of 1.2 cM. Using more probes per unigene increased by two-fold the likelihood of detecting segregating SFPs eventually resulting in more genes mapped. In silico validation showed that 87% of the SFPs map to the expected location on the 4.5X draft sequence of the Eucalyptus grandis genome. Conclusions The Eucalyptus 1,845 gene map is the most highly enriched map for transcriptional information for any forest tree species to date. It represents a major improvement on the number of genes previously positioned on Eucalyptus maps and provides an initial glimpse at the gene space for this global tree genome. A general protocol is proposed to build high-density transcript linkage maps in less characterized plant species by SFP genotyping

  12. A constrained polynomial regression procedure for estimating the local False Discovery Rate

    Directory of Open Access Journals (Sweden)

    Broët Philippe

    2007-06-01

    Full Text Available Abstract Background In the context of genomic association studies, for which a large number of statistical tests are performed simultaneously, the local False Discovery Rate (lFDR, which quantifies the evidence of a specific gene association with a clinical or biological variable of interest, is a relevant criterion for taking into account the multiple testing problem. The lFDR not only allows an inference to be made for each gene through its specific value, but also an estimate of Benjamini-Hochberg's False Discovery Rate (FDR for subsets of genes. Results In the framework of estimating procedures without any distributional assumption under the alternative hypothesis, a new and efficient procedure for estimating the lFDR is described. The results of a simulation study indicated good performances for the proposed estimator in comparison to four published ones. The five different procedures were applied to real datasets. Conclusion A novel and efficient procedure for estimating lFDR was developed and evaluated.

  13. Developmental transitions in Arabidopsis are regulated by antisense RNAs resulting from bidirectionally transcribed genes.

    Science.gov (United States)

    Krzyczmonik, Katarzyna; Wroblewska-Swiniarska, Agata; Swiezewski, Szymon

    2017-07-03

    Transcription terminators are DNA elements located at the 3' end of genes that ensure efficient cleavage of nascent RNA generating the 3' end of mRNA, as well as facilitating disengagement of elongating DNA-dependent RNA polymerase II. Surprisingly, terminators are also a potent source of antisense transcription. We have recently described an Arabidopsis antisense transcript originating from the 3' end of a master regulator of Arabidopsis thaliana seed dormancy DOG1. In this review, we discuss the broader implications of our discovery in light of recent developments in yeast and Arabidopsis. We show that, surprisingly, the key features of terminators that give rise to antisense transcription are preserved between Arabidopsis and yeast, suggesting a conserved mechanism. We also compare our discovery to known antisense-based regulatory mechanisms, highlighting the link between antisense-based gene expression regulation and major developmental transitions in plants.

  14. Discovery of novel bacterial toxins by genomics and computational biology.

    Science.gov (United States)

    Doxey, Andrew C; Mansfield, Michael J; Montecucco, Cesare

    2018-06-01

    Hundreds and hundreds of bacterial protein toxins are presently known. Traditionally, toxin identification begins with pathological studies of bacterial infectious disease. Following identification and cultivation of a bacterial pathogen, the protein toxin is purified from the culture medium and its pathogenic activity is studied using the methods of biochemistry and structural biology, cell biology, tissue and organ biology, and appropriate animal models, supplemented by bioimaging techniques. The ongoing and explosive development of high-throughput DNA sequencing and bioinformatic approaches have set in motion a revolution in many fields of biology, including microbiology. One consequence is that genes encoding novel bacterial toxins can be identified by bioinformatic and computational methods based on previous knowledge accumulated from studies of the biology and pathology of thousands of known bacterial protein toxins. Starting from the paradigmatic cases of diphtheria toxin, tetanus and botulinum neurotoxins, this review discusses traditional experimental approaches as well as bioinformatics and genomics-driven approaches that facilitate the discovery of novel bacterial toxins. We discuss recent work on the identification of novel botulinum-like toxins from genera such as Weissella, Chryseobacterium, and Enteroccocus, and the implications of these computationally identified toxins in the field. Finally, we discuss the promise of metagenomics in the discovery of novel toxins and their ecological niches, and present data suggesting the existence of uncharacterized, botulinum-like toxin genes in insect gut metagenomes. Copyright © 2018. Published by Elsevier Ltd.

  15. Integrative subtype discovery in glioblastoma using iCluster.

    Directory of Open Access Journals (Sweden)

    Ronglai Shen

    Full Text Available Large-scale cancer genome projects, such as the Cancer Genome Atlas (TCGA project, are comprehensive molecular characterization efforts to accelerate our understanding of cancer biology and the discovery of new therapeutic targets. The accumulating wealth of multidimensional data provides a new paradigm for important research problems including cancer subtype discovery. The current standard approach relies on separate clustering analyses followed by manual integration. Results can be highly data type dependent, restricting the ability to discover new insights from multidimensional data. In this study, we present an integrative subtype analysis of the TCGA glioblastoma (GBM data set. Our analysis revealed new insights through integrated subtype characterization. We found three distinct integrated tumor subtypes. Subtype 1 lacks the classical GBM events of chr 7 gain and chr 10 loss. This subclass is enriched for the G-CIMP phenotype and shows hypermethylation of genes involved in brain development and neuronal differentiation. The tumors in this subclass display a Proneural expression profile. Subtype 2 is characterized by a near complete association with EGFR amplification, overrepresentation of promoter methylation of homeobox and G-protein signaling genes, and a Classical expression profile. Subtype 3 is characterized by NF1 and PTEN alterations and exhibits a Mesenchymal-like expression profile. The data analysis workflow we propose provides a unified and computationally scalable framework to harness the full potential of large-scale integrated cancer genomic data for integrative subtype discovery.

  16. Identifying key genes in rheumatoid arthritis by weighted gene co-expression network analysis.

    Science.gov (United States)

    Ma, Chunhui; Lv, Qi; Teng, Songsong; Yu, Yinxian; Niu, Kerun; Yi, Chengqin

    2017-08-01

    This study aimed to identify rheumatoid arthritis (RA) related genes based on microarray data using the WGCNA (weighted gene co-expression network analysis) method. Two gene expression profile datasets GSE55235 (10 RA samples and 10 healthy controls) and GSE77298 (16 RA samples and seven healthy controls) were downloaded from Gene Expression Omnibus database. Characteristic genes were identified using metaDE package. WGCNA was used to find disease-related networks based on gene expression correlation coefficients, and module significance was defined as the average gene significance of all genes used to assess the correlation between the module and RA status. Genes in the disease-related gene co-expression network were subject to functional annotation and pathway enrichment analysis using Database for Annotation Visualization and Integrated Discovery. Characteristic genes were also mapped to the Connectivity Map to screen small molecules. A total of 599 characteristic genes were identified. For each dataset, characteristic genes in the green, red and turquoise modules were most closely associated with RA, with gene numbers of 54, 43 and 79, respectively. These genes were enriched in totally enriched in 17 Gene Ontology terms, mainly related to immune response (CD97, FYB, CXCL1, IKBKE, CCR1, etc.), inflammatory response (CD97, CXCL1, C3AR1, CCR1, LYZ, etc.) and homeostasis (C3AR1, CCR1, PLN, CCL19, PPT1, etc.). Two small-molecule drugs sanguinarine and papaverine were predicted to have a therapeutic effect against RA. Genes related to immune response, inflammatory response and homeostasis presumably have critical roles in RA pathogenesis. Sanguinarine and papaverine have a potential therapeutic effect against RA. © 2017 Asia Pacific League of Associations for Rheumatology and John Wiley & Sons Australia, Ltd.

  17. Study of miRNA Based Gene Regulation, Involved in Solid Cancer, by the Assistance of Argonaute Protein

    Directory of Open Access Journals (Sweden)

    Surya Narayan Rath

    2016-09-01

    Full Text Available Solid tumor is generally observed in tissues of epithelial or endothelial cells of lung, breast, prostate, pancreases, colorectal, stomach, and bladder, where several genes transcription is regulated by the microRNAs (miRNAs. Argonaute (AGO protein is a family of protein which assists in miRNAs to bind with mRNAs of the target genes. Hence, study of the binding mechanism between AGO protein and miRNAs, and also with miRNAs-mRNAs duplex is crucial for understanding the RNA silencing mechanism. In the current work, 64 genes and 23 miRNAs have been selected from literatures, whose deregulation is well established in seven types of solid cancer like lung, breast, prostate, pancreases, colorectal, stomach, and bladder cancer. In silico study reveals, miRNAs namely, miR-106a, miR-21, and miR-29b-2 have a strong binding affinity towards PTEN, TGFBR2, and VEGFA genes, respectively, suggested as important factors in RNA silencing mechanism. Furthermore, interaction between AGO protein (PDB ID-3F73, chain A with selected miRNAs and with miRNAs-mRNAs duplex were studied computationally to understand their binding at molecular level. The residual interaction and hydrogen bonding are inspected in Discovery Studio 3.5 suites. The current investigation throws light on understanding miRNAs based gene silencing mechanism in solid cancer.

  18. The Tangled Tale of Genes and Environment: Moore's The Dependent Gene: The Fallacy of “nature VS. Nurture”

    OpenAIRE

    Schneider, Susan M

    2007-01-01

    Nature–nurture views that smack of genetic determinism remain prevalent. Yet, the increasing knowledge base shows ever more clearly that environmental factors and genes form a fully interactional system at all levels. Moore's book covers the major topics of discovery and dispute, including behavior genetics and the twin studies, developmental psychobiology, and developmental systems theory. Knowledge of this larger life-sciences context for behavior principles will become increasingly importa...

  19. Network-based approaches to climate knowledge discovery

    Science.gov (United States)

    Budich, Reinhard; Nyberg, Per; Weigel, Tobias

    2011-11-01

    Climate Knowledge Discovery Workshop; Hamburg, Germany, 30 March to 1 April 2011 Do complex networks combined with semantic Web technologies offer the next generation of solutions in climate science? To address this question, a first Climate Knowledge Discovery (CKD) Workshop, hosted by the German Climate Computing Center (Deutsches Klimarechenzentrum (DKRZ)), brought together climate and computer scientists from major American and European laboratories, data centers, and universities, as well as representatives from industry, the broader academic community, and the semantic Web communities. The participants, representing six countries, were concerned with large-scale Earth system modeling and computational data analysis. The motivation for the meeting was the growing problem that climate scientists generate data faster than it can be interpreted and the need to prepare for further exponential data increases. Current analysis approaches are focused primarily on traditional methods, which are best suited for large-scale phenomena and coarse-resolution data sets. The workshop focused on the open discussion of ideas and technologies to provide the next generation of solutions to cope with the increasing data volumes in climate science.

  20. Systems Biology Modeling of the Radiation Sensitivity Network: A Biomarker Discovery Platform

    International Nuclear Information System (INIS)

    Eschrich, Steven; Zhang Hongling; Zhao Haiyan; Boulware, David; Lee, Ji-Hyun; Bloom, Gregory; Torres-Roca, Javier F.

    2009-01-01

    Purpose: The discovery of effective biomarkers is a fundamental goal of molecular medicine. Developing a systems-biology understanding of radiosensitivity can enhance our ability of identifying radiation-specific biomarkers. Methods and Materials: Radiosensitivity, as represented by the survival fraction at 2 Gy was modeled in 48 human cancer cell lines. We applied a linear regression algorithm that integrates gene expression with biological variables, including ras status (mut/wt), tissue of origin and p53 status (mut/wt). Results: The biomarker discovery platform is a network representation of the top 500 genes identified by linear regression analysis. This network was reduced to a 10-hub network that includes c-Jun, HDAC1, RELA (p65 subunit of NFKB), PKC-beta, SUMO-1, c-Abl, STAT1, AR, CDK1, and IRF1. Nine targets associated with radiosensitization drugs are linked to the network, demonstrating clinical relevance. Furthermore, the model identified four significant radiosensitivity clusters of terms and genes. Ras was a dominant variable in the analysis, as was the tissue of origin, and their interaction with gene expression but not p53. Overrepresented biological pathways differed between clusters but included DNA repair, cell cycle, apoptosis, and metabolism. The c-Jun network hub was validated using a knockdown approach in 8 human cell lines representing lung, colon, and breast cancers. Conclusion: We have developed a novel radiation-biomarker discovery platform using a systems biology modeling approach. We believe this platform will play a central role in the integration of biology into clinical radiation oncology practice.

  1. A Ligand-observed Mass Spectrometry Approach Integrated into the Fragment Based Lead Discovery Pipeline

    Science.gov (United States)

    Chen, Xin; Qin, Shanshan; Chen, Shuai; Li, Jinlong; Li, Lixin; Wang, Zhongling; Wang, Quan; Lin, Jianping; Yang, Cheng; Shui, Wenqing

    2015-01-01

    In fragment-based lead discovery (FBLD), a cascade combining multiple orthogonal technologies is required for reliable detection and characterization of fragment binding to the target. Given the limitations of the mainstream screening techniques, we presented a ligand-observed mass spectrometry approach to expand the toolkits and increase the flexibility of building a FBLD pipeline especially for tough targets. In this study, this approach was integrated into a FBLD program targeting the HCV RNA polymerase NS5B. Our ligand-observed mass spectrometry analysis resulted in the discovery of 10 hits from a 384-member fragment library through two independent screens of complex cocktails and a follow-up validation assay. Moreover, this MS-based approach enabled quantitative measurement of weak binding affinities of fragments which was in general consistent with SPR analysis. Five out of the ten hits were then successfully translated to X-ray structures of fragment-bound complexes to lay a foundation for structure-based inhibitor design. With distinctive strengths in terms of high capacity and speed, minimal method development, easy sample preparation, low material consumption and quantitative capability, this MS-based assay is anticipated to be a valuable addition to the repertoire of current fragment screening techniques. PMID:25666181

  2. An agent-based peer-to-peer architecture for semantic discovery of manufacturing services across virtual enterprises

    Science.gov (United States)

    Zhang, Wenyu; Zhang, Shuai; Cai, Ming; Jian, Wu

    2015-04-01

    With the development of virtual enterprise (VE) paradigm, the usage of serviceoriented architecture (SOA) is increasingly being considered for facilitating the integration and utilisation of distributed manufacturing resources. However, due to the heterogeneous nature among VEs, the dynamic nature of a VE and the autonomous nature of each VE member, the lack of both sophisticated coordination mechanism in the popular centralised infrastructure and semantic expressivity in the existing SOA standards make the current centralised, syntactic service discovery method undesirable. This motivates the proposed agent-based peer-to-peer (P2P) architecture for semantic discovery of manufacturing services across VEs. Multi-agent technology provides autonomous and flexible problemsolving capabilities in dynamic and adaptive VE environments. Peer-to-peer overlay provides highly scalable coupling across decentralised VEs, each of which exhibiting as a peer composed of multiple agents dealing with manufacturing services. The proposed architecture utilises a novel, efficient, two-stage search strategy - semantic peer discovery and semantic service discovery - to handle the complex searches of manufacturing services across VEs through fast peer filtering. The operation and experimental evaluation of the prototype system are presented to validate the implementation of the proposed approach.

  3. Simulated JWST/NIRISS Transit Spectroscopy of Anticipated TESS Planets Compared to Select Discoveries from Space-Based and Ground-Based Surveys

    Science.gov (United States)

    Louie, Dana; Deming, Drake; Albert, Loic; Bouma, Luke; Bean, Jacob; Lopez-Morales, Mercedes

    2018-01-01

    The Transiting Exoplanet Survey Satellite (TESS) will embark in 2018 on a 2-year wide-field survey mission of most of the celestial sky, discovering over a thousand super-Earth and sub-Neptune-sized exoplanets potentially suitable for follow-up observations using the James Webb Space Telescope (JWST). Bouma et al. (2017) and Sullivan et al. (2015) used Monte Carlo simulations to predict the properties of the planetary systems that TESS is likely to detect, basing their simulations upon Kepler-derived planet occurrence rates and photometric performance models for the TESS cameras. We employed a JWST Near InfraRed Imager and Slitless Spectrograph (NIRISS) simulation tool to estimate the signal-to-noise (S/N) that JWST/NIRISS will attain in transmission spectroscopy of these anticipated TESS discoveries, and we then compared the S/N for anticipated TESS discoveries to our estimates of S/N for 18 known exoplanets. We analyzed the sensitivity of our results to planetary composition, cloud cover, and presence of an observational noise floor. We find that only a few anticipated TESS discoveries in the terrestrial planet regime will result in better JWST/NIRISS S/N than currently known exoplanets, such as the TRAPPIST-1 planets, GJ1132b, or LHS1140b. However, we emphasize that this outcome is based upon Kepler-derived occurrence rates, and that co-planar compact systems (e.g. TRAPPIST-1) were not included in predicting the anticipated TESS planet yield. Furthermore, our results show that several hundred anticipated TESS discoveries in the super-Earth and sub-Neptune regime will produce S/N higher than currently known exoplanets such as K2-3b or K2-3c. We apply our results to estimate the scope of a JWST follow-up observation program devoted to mapping the transition region between high molecular weight and primordial planetary atmospheres.

  4. Volatility Discovery

    DEFF Research Database (Denmark)

    Dias, Gustavo Fruet; Scherrer, Cristina; Papailias, Fotis

    The price discovery literature investigates how homogenous securities traded on different markets incorporate information into prices. We take this literature one step further and investigate how these markets contribute to stochastic volatility (volatility discovery). We formally show...... that the realized measures from homogenous securities share a fractional stochastic trend, which is a combination of the price and volatility discovery measures. Furthermore, we show that volatility discovery is associated with the way that market participants process information arrival (market sensitivity......). Finally, we compute volatility discovery for 30 actively traded stocks in the U.S. and report that Nyse and Arca dominate Nasdaq....

  5. A Performance/Cost Evaluation for a GPU-Based Drug Discovery Application on Volunteer Computing

    Science.gov (United States)

    Guerrero, Ginés D.; Imbernón, Baldomero; García, José M.

    2014-01-01

    Bioinformatics is an interdisciplinary research field that develops tools for the analysis of large biological databases, and, thus, the use of high performance computing (HPC) platforms is mandatory for the generation of useful biological knowledge. The latest generation of graphics processing units (GPUs) has democratized the use of HPC as they push desktop computers to cluster-level performance. Many applications within this field have been developed to leverage these powerful and low-cost architectures. However, these applications still need to scale to larger GPU-based systems to enable remarkable advances in the fields of healthcare, drug discovery, genome research, etc. The inclusion of GPUs in HPC systems exacerbates power and temperature issues, increasing the total cost of ownership (TCO). This paper explores the benefits of volunteer computing to scale bioinformatics applications as an alternative to own large GPU-based local infrastructures. We use as a benchmark a GPU-based drug discovery application called BINDSURF that their computational requirements go beyond a single desktop machine. Volunteer computing is presented as a cheap and valid HPC system for those bioinformatics applications that need to process huge amounts of data and where the response time is not a critical factor. PMID:25025055

  6. The rise of fragment-based drug discovery.

    Science.gov (United States)

    Murray, Christopher W; Rees, David C

    2009-06-01

    The search for new drugs is plagued by high attrition rates at all stages in research and development. Chemists have an opportunity to tackle this problem because attrition can be traced back, in part, to the quality of the chemical leads. Fragment-based drug discovery (FBDD) is a new approach, increasingly used in the pharmaceutical industry, for reducing attrition and providing leads for previously intractable biological targets. FBDD identifies low-molecular-weight ligands (∼150 Da) that bind to biologically important macromolecules. The three-dimensional experimental binding mode of these fragments is determined using X-ray crystallography or NMR spectroscopy, and is used to facilitate their optimization into potent molecules with drug-like properties. Compared with high-throughput-screening, the fragment approach requires fewer compounds to be screened, and, despite the lower initial potency of the screening hits, offers more efficient and fruitful optimization campaigns. Here, we review the rise of FBDD, including its application to discovering clinical candidates against targets for which other chemistry approaches have struggled.

  7. Fragment based drug discovery: practical implementation based on ¹⁹F NMR spectroscopy.

    Science.gov (United States)

    Jordan, John B; Poppe, Leszek; Xia, Xiaoyang; Cheng, Alan C; Sun, Yax; Michelsen, Klaus; Eastwood, Heather; Schnier, Paul D; Nixey, Thomas; Zhong, Wenge

    2012-01-26

    Fragment based drug discovery (FBDD) is a widely used tool for discovering novel therapeutics. NMR is a powerful means for implementing FBDD, and several approaches have been proposed utilizing (1)H-(15)N heteronuclear single quantum coherence (HSQC) as well as one-dimensional (1)H and (19)F NMR to screen compound mixtures against a target of interest. While proton-based NMR methods of fragment screening (FBS) have been well documented and are widely used, the use of (19)F detection in FBS has been only recently introduced (Vulpetti et al. J. Am. Chem. Soc.2009, 131 (36), 12949-12959) with the aim of targeting "fluorophilic" sites in proteins. Here, we demonstrate a more general use of (19)F NMR-based fragment screening in several areas: as a key tool for rapid and sensitive detection of fragment hits, as a method for the rapid development of structure-activity relationship (SAR) on the hit-to-lead path using in-house libraries and/or commercially available compounds, and as a quick and efficient means of assessing target druggability.

  8. Genome-wide siRNA-based functional genomics of pigmentation identifies novel genes and pathways that impact melanogenesis in human cells.

    Directory of Open Access Journals (Sweden)

    Anand K Ganesan

    2008-12-01

    Full Text Available Melanin protects the skin and eyes from the harmful effects of UV irradiation, protects neural cells from toxic insults, and is required for sound conduction in the inner ear. Aberrant regulation of melanogenesis underlies skin disorders (melasma and vitiligo, neurologic disorders (Parkinson's disease, auditory disorders (Waardenburg's syndrome, and opthalmologic disorders (age related macular degeneration. Much of the core synthetic machinery driving melanin production has been identified; however, the spectrum of gene products participating in melanogenesis in different physiological niches is poorly understood. Functional genomics based on RNA-mediated interference (RNAi provides the opportunity to derive unbiased comprehensive collections of pharmaceutically tractable single gene targets supporting melanin production. In this study, we have combined a high-throughput, cell-based, one-well/one-gene screening platform with a genome-wide arrayed synthetic library of chemically synthesized, small interfering RNAs to identify novel biological pathways that govern melanin biogenesis in human melanocytes. Ninety-two novel genes that support pigment production were identified with a low false discovery rate. Secondary validation and preliminary mechanistic studies identified a large panel of targets that converge on tyrosinase expression and stability. Small molecule inhibition of a family of gene products in this class was sufficient to impair chronic tyrosinase expression in pigmented melanoma cells and UV-induced tyrosinase expression in primary melanocytes. Isolation of molecular machinery known to support autophagosome biosynthesis from this screen, together with in vitro and in vivo validation, exposed a close functional relationship between melanogenesis and autophagy. In summary, these studies illustrate the power of RNAi-based functional genomics to identify novel genes, pathways, and pharmacologic agents that impact a biological phenotype

  9. GENOME-ENABLED DISCOVERY OF CARBON SEQUESTRATION GENES IN POPLAR

    Energy Technology Data Exchange (ETDEWEB)

    DAVIS J M

    2007-10-11

    Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, the ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.

  10. Proteomics for discovery of candidate colorectal cancer biomarkers

    Science.gov (United States)

    Álvarez-Chaver, Paula; Otero-Estévez, Olalla; Páez de la Cadena, María; Rodríguez-Berrocal, Francisco J; Martínez-Zorzano, Vicenta S

    2014-01-01

    Colorectal cancer (CRC) is the second most common cause of cancer-related deaths in Europe and other Western countries, mainly due to the lack of well-validated clinically useful biomarkers with enough sensitivity and specificity to detect this disease at early stages. Although it is well known that the pathogenesis of CRC is a progressive accumulation of mutations in multiple genes, much less is known at the proteome level. Therefore, in the last years many proteomic studies have been conducted to find new candidate protein biomarkers for diagnosis, prognosis and as therapeutic targets for this malignancy, as well as to elucidate the molecular mechanisms of colorectal carcinogenesis. An important advantage of the proteomic approaches is the capacity to look for multiple differentially expressed proteins in a single study. This review provides an overview of the recent reports describing the different proteomic tools used for the discovery of new protein markers for CRC such as two-dimensional electrophoresis methods, quantitative mass spectrometry-based techniques or protein microarrays. Additionally, we will also focus on the diverse biological samples used for CRC biomarker discovery such as tissue, serum and faeces, besides cell lines and murine models, discussing their advantages and disadvantages, and summarize the most frequently identified candidate CRC markers. PMID:24744574

  11. Exploiting adaptive laboratory evolution of Streptomyces clavuligerus for antibiotic discovery and overproduction.

    Directory of Open Access Journals (Sweden)

    Pep Charusanti

    Full Text Available Adaptation is normally viewed as the enemy of the antibiotic discovery and development process because adaptation among pathogens to antibiotic exposure leads to resistance. We present a method here that, in contrast, exploits the power of adaptation among antibiotic producers to accelerate the discovery of antibiotics. A competition-based adaptive laboratory evolution scheme is presented whereby an antibiotic-producing microorganism is competed against a target pathogen and serially passed over time until the producer evolves the ability to synthesize a chemical entity that inhibits growth of the pathogen. When multiple Streptomyces clavuligerus replicates were adaptively evolved against methicillin-resistant Staphylococcus aureus N315 in this manner, a strain emerged that acquired the ability to constitutively produce holomycin. In contrast, no holomycin could be detected from the unevolved wild-type strain. Moreover, genome re-sequencing revealed that the evolved strain had lost pSCL4, a large 1.8 Mbp plasmid, and acquired several single nucleotide polymorphisms in genes that have been shown to affect secondary metabolite biosynthesis. These results demonstrate that competition-based adaptive laboratory evolution can constitute a platform to create mutants that overproduce known antibiotics and possibly to discover new compounds as well.

  12. Gene discovery in EST sequences from the wheat leaf rust fungus Puccinia triticina sexual spores, asexual spores and haustoria, compared to other rust and corn smut fungi

    Science.gov (United States)

    2011-01-01

    Background Rust fungi are biotrophic basidiomycete plant pathogens that cause major diseases on plants and trees world-wide, affecting agriculture and forestry. Their biotrophic nature precludes many established molecular genetic manipulations and lines of research. The generation of genomic resources for these microbes is leading to novel insights into biology such as interactions with the hosts and guiding directions for breakthrough research in plant pathology. Results To support gene discovery and gene model verification in the genome of the wheat leaf rust fungus, Puccinia triticina (Pt), we have generated Expressed Sequence Tags (ESTs) by sampling several life cycle stages. We focused on several spore stages and isolated haustorial structures from infected wheat, generating 17,684 ESTs. We produced sequences from both the sexual (pycniospores, aeciospores and teliospores) and asexual (germinated urediniospores) stages of the life cycle. From pycniospores and aeciospores, produced by infecting the alternate host, meadow rue (Thalictrum speciosissimum), 4,869 and 1,292 reads were generated, respectively. We generated 3,703 ESTs from teliospores produced on the senescent primary wheat host. Finally, we generated 6,817 reads from haustoria isolated from infected wheat as well as 1,003 sequences from germinated urediniospores. Along with 25,558 previously generated ESTs, we compiled a database of 13,328 non-redundant sequences (4,506 singlets and 8,822 contigs). Fungal genes were predicted using the EST version of the self-training GeneMarkS algorithm. To refine the EST database, we compared EST sequences by BLASTN to a set of 454 pyrosequencing-generated contigs and Sanger BAC-end sequences derived both from the Pt genome, and to ESTs and genome reads from wheat. A collection of 6,308 fungal genes was identified and compared to sequences of the cereal rusts, Puccinia graminis f. sp. tritici (Pgt) and stripe rust, P. striiformis f. sp. tritici (Pst), and poplar

  13. Gene discovery in EST sequences from the wheat leaf rust fungus Puccinia triticina sexual spores, asexual spores and haustoria, compared to other rust and corn smut fungi

    Directory of Open Access Journals (Sweden)

    Wynhoven Brian

    2011-03-01

    Full Text Available Abstract Background Rust fungi are biotrophic basidiomycete plant pathogens that cause major diseases on plants and trees world-wide, affecting agriculture and forestry. Their biotrophic nature precludes many established molecular genetic manipulations and lines of research. The generation of genomic resources for these microbes is leading to novel insights into biology such as interactions with the hosts and guiding directions for breakthrough research in plant pathology. Results To support gene discovery and gene model verification in the genome of the wheat leaf rust fungus, Puccinia triticina (Pt, we have generated Expressed Sequence Tags (ESTs by sampling several life cycle stages. We focused on several spore stages and isolated haustorial structures from infected wheat, generating 17,684 ESTs. We produced sequences from both the sexual (pycniospores, aeciospores and teliospores and asexual (germinated urediniospores stages of the life cycle. From pycniospores and aeciospores, produced by infecting the alternate host, meadow rue (Thalictrum speciosissimum, 4,869 and 1,292 reads were generated, respectively. We generated 3,703 ESTs from teliospores produced on the senescent primary wheat host. Finally, we generated 6,817 reads from haustoria isolated from infected wheat as well as 1,003 sequences from germinated urediniospores. Along with 25,558 previously generated ESTs, we compiled a database of 13,328 non-redundant sequences (4,506 singlets and 8,822 contigs. Fungal genes were predicted using the EST version of the self-training GeneMarkS algorithm. To refine the EST database, we compared EST sequences by BLASTN to a set of 454 pyrosequencing-generated contigs and Sanger BAC-end sequences derived both from the Pt genome, and to ESTs and genome reads from wheat. A collection of 6,308 fungal genes was identified and compared to sequences of the cereal rusts, Puccinia graminis f. sp. tritici (Pgt and stripe rust, P. striiformis f. sp

  14. Research on hotspot discovery in internet public opinions based on improved K-means.

    Science.gov (United States)

    Wang, Gensheng

    2013-01-01

    How to discover hotspot in the Internet public opinions effectively is a hot research field for the researchers related which plays a key role for governments and corporations to find useful information from mass data in the Internet. An improved K-means algorithm for hotspot discovery in internet public opinions is presented based on the analysis of existing defects and calculation principle of original K-means algorithm. First, some new methods are designed to preprocess website texts, select and express the characteristics of website texts, and define the similarity between two website texts, respectively. Second, clustering principle and the method of initial classification centers selection are analyzed and improved in order to overcome the limitations of original K-means algorithm. Finally, the experimental results verify that the improved algorithm can improve the clustering stability and classification accuracy of hotspot discovery in internet public opinions when used in practice.

  15. Research on Hotspot Discovery in Internet Public Opinions Based on Improved K-Means

    Science.gov (United States)

    2013-01-01

    How to discover hotspot in the Internet public opinions effectively is a hot research field for the researchers related which plays a key role for governments and corporations to find useful information from mass data in the Internet. An improved K-means algorithm for hotspot discovery in internet public opinions is presented based on the analysis of existing defects and calculation principle of original K-means algorithm. First, some new methods are designed to preprocess website texts, select and express the characteristics of website texts, and define the similarity between two website texts, respectively. Second, clustering principle and the method of initial classification centers selection are analyzed and improved in order to overcome the limitations of original K-means algorithm. Finally, the experimental results verify that the improved algorithm can improve the clustering stability and classification accuracy of hotspot discovery in internet public opinions when used in practice. PMID:24106496

  16. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data.

    Science.gov (United States)

    Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-03-01

    Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics

  17. Assessment of Dengue virus helicase and methyltransferase as targets for fragment-based drug discovery.

    Science.gov (United States)

    Coutard, Bruno; Decroly, Etienne; Li, Changqing; Sharff, Andrew; Lescar, Julien; Bricogne, Gérard; Barral, Karine

    2014-06-01

    Seasonal and pandemic flaviviruses continue to be leading global health concerns. With the view to help drug discovery against Dengue virus (DENV), a fragment-based experimental approach was applied to identify small molecule ligands targeting two main components of the flavivirus replication complex: the NS3 helicase (Hel) and the NS5 mRNA methyltransferase (MTase) domains. A library of 500 drug-like fragments was first screened by thermal-shift assay (TSA) leading to the identification of 36 and 32 fragment hits binding Hel and MTase from DENV, respectively. In a second stage, we set up a fragment-based X-ray crystallographic screening (FBS-X) in order to provide both validated fragment hits and structural binding information. No fragment hit was confirmed for DENV Hel. In contrast, a total of seven fragments were identified as DENV MTase binders and structures of MTase-fragment hit complexes were solved at resolution at least 2.0Å or better. All fragment hits identified contain either a five- or six-membered aromatic ring or both, and three novel binding sites were located on the MTase. To further characterize the fragment hits identified by TSA and FBS-X, we performed enzymatic assays to assess their inhibition effect on the N7- and 2'-O-MTase enzymatic activities: five of these fragment hits inhibit at least one of the two activities with IC50 ranging from 180μM to 9mM. This work validates the FBS-X strategy for identifying new anti-flaviviral hits targeting MTase, while Hel might not be an amenable target for fragment-based drug discovery (FBDD). This approach proved to be a fast and efficient screening method for FBDD target validation and discovery of starting hits for the development of higher affinity molecules that bind to novel allosteric sites. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. Discovery, biosynthesis, and rational engineering of novel enterocin and wailupemycin polyketide analogues.

    Science.gov (United States)

    Kalaitzis, John A

    2013-01-01

    The marine actinomycete Streptomyces maritimus produces a structurally diverse set of unusual polyketide natural products including the major metabolite enterocin. Investigations of enterocin biosynthesis revealed that the unique carbon skeleton is derived from an aromatic polyketide pathway which is genetically coded by the 21.3 kb enc gene cluster in S. maritimus. Characterization of the enc biosynthesis gene cluster and subsequent manipulation of it via heterologous expression and/or mutagenesis enabled the discovery of other enc-based metabolites that were produced in only very minor amounts in the wild type. Also described are techniques used to harness the enterocin biosynthetic machinery in order to generate unnatural enc-derived polyketide analogues. This review focuses upon the molecular methods used in combination with classical natural products detection and isolation techniques to access minor metabolites of the S. maritimus secondary metabolome.

  19. Discovery of accessible locations using region-based geo-social data

    KAUST Repository

    Wang, Yan

    2018-03-17

    Geo-social data plays a significant role in location discovery and recommendation. In this light, we propose and study a novel problem of discovering accessible locations in spatial networks using region-based geo-social data. Given a set Q of query regions, the top-k accessible location discovery query (k ALDQ) finds k locations that have the highest spatial-density correlations to Q. Both the spatial distances between locations and regions and the POI (point of interest) density within the regions are taken into account. We believe that this type of k ALDQ query can bring significant benefit to many applications such as travel planning, facility allocation, and urban planning. Three challenges exist in k ALDQ: (1) how to model the spatial-density correlation practically, (2) how to prune the search space effectively, and (3) how to schedule the searches from multiple query regions. To tackle the challenges and process k ALDQ effectively and efficiently, we first define a series of spatial and density metrics to model the spatial-density correlation. Then we propose a novel three-phase solution with a pair of upper and lower bounds of the spatial-density correlation and a heuristic scheduling strategy to schedule multiple query regions. Finally, we conduct extensive experiments on real and synthetic spatial data to demonstrate the performance of the developed solutions.

  20. Methodological issues in detecting gene-gene interactions in breast cancer susceptibility: a population-based study in Ontario

    Directory of Open Access Journals (Sweden)

    Onay Venus

    2007-08-01

    Full Text Available Abstract Background There is growing evidence that gene-gene interactions are ubiquitous in determining the susceptibility to common human diseases. The investigation of such gene-gene interactions presents new statistical challenges for studies with relatively small sample sizes as the number of potential interactions in the genome can be large. Breast cancer provides a useful paradigm to study genetically complex diseases because commonly occurring single nucleotide polymorphisms (SNPs may additively or synergistically disturb the system-wide communication of the cellular processes leading to cancer development. Methods In this study, we systematically studied SNP-SNP interactions among 19 SNPs from 18 key genes involved in major cancer pathways in a sample of 398 breast cancer cases and 372 controls from Ontario. We discuss the methodological issues associated with the detection of SNP-SNP interactions in this dataset by applying and comparing three commonly used methods: the logistic regression model, classification and regression trees (CART, and the multifactor dimensionality reduction (MDR method. Results Our analyses show evidence for several simple (two-way and complex (multi-way SNP-SNP interactions associated with breast cancer. For example, all three methods identified XPD-[Lys751Gln]*IL10-[G(-1082A] as the most significant two-way interaction. CART and MDR identified the same critical SNPs participating in complex interactions. Our results suggest that the use of multiple statistical approaches (or an integrated approach rather than a single methodology could be the best strategy to elucidate complex gene interactions that have generally very different patterns. Conclusion The strategy used here has the potential to identify complex biological relationships among breast cancer genes and processes. This will lead to the discovery of novel biological information, which will improve breast cancer risk management.

  1. Natural Products for Drug Discovery in the 21st Century: Innovations for Novel Drug Discovery

    Directory of Open Access Journals (Sweden)

    Nicholas Ekow Thomford

    2018-05-01

    Full Text Available The therapeutic properties of plants have been recognised since time immemorial. Many pathological conditions have been treated using plant-derived medicines. These medicines are used as concoctions or concentrated plant extracts without isolation of active compounds. Modern medicine however, requires the isolation and purification of one or two active compounds. There are however a lot of global health challenges with diseases such as cancer, degenerative diseases, HIV/AIDS and diabetes, of which modern medicine is struggling to provide cures. Many times the isolation of “active compound” has made the compound ineffective. Drug discovery is a multidimensional problem requiring several parameters of both natural and synthetic compounds such as safety, pharmacokinetics and efficacy to be evaluated during drug candidate selection. The advent of latest technologies that enhance drug design hypotheses such as Artificial Intelligence, the use of ‘organ-on chip’ and microfluidics technologies, means that automation has become part of drug discovery. This has resulted in increased speed in drug discovery and evaluation of the safety, pharmacokinetics and efficacy of candidate compounds whilst allowing novel ways of drug design and synthesis based on natural compounds. Recent advances in analytical and computational techniques have opened new avenues to process complex natural products and to use their structures to derive new and innovative drugs. Indeed, we are in the era of computational molecular design, as applied to natural products. Predictive computational softwares have contributed to the discovery of molecular targets of natural products and their derivatives. In future the use of quantum computing, computational softwares and databases in modelling molecular interactions and predicting features and parameters needed for drug development, such as pharmacokinetic and pharmacodynamics, will result in few false positive leads in drug

  2. Natural Products for Drug Discovery in the 21st Century: Innovations for Novel Drug Discovery.

    Science.gov (United States)

    Thomford, Nicholas Ekow; Senthebane, Dimakatso Alice; Rowe, Arielle; Munro, Daniella; Seele, Palesa; Maroyi, Alfred; Dzobo, Kevin

    2018-05-25

    The therapeutic properties of plants have been recognised since time immemorial. Many pathological conditions have been treated using plant-derived medicines. These medicines are used as concoctions or concentrated plant extracts without isolation of active compounds. Modern medicine however, requires the isolation and purification of one or two active compounds. There are however a lot of global health challenges with diseases such as cancer, degenerative diseases, HIV/AIDS and diabetes, of which modern medicine is struggling to provide cures. Many times the isolation of "active compound" has made the compound ineffective. Drug discovery is a multidimensional problem requiring several parameters of both natural and synthetic compounds such as safety, pharmacokinetics and efficacy to be evaluated during drug candidate selection. The advent of latest technologies that enhance drug design hypotheses such as Artificial Intelligence, the use of 'organ-on chip' and microfluidics technologies, means that automation has become part of drug discovery. This has resulted in increased speed in drug discovery and evaluation of the safety, pharmacokinetics and efficacy of candidate compounds whilst allowing novel ways of drug design and synthesis based on natural compounds. Recent advances in analytical and computational techniques have opened new avenues to process complex natural products and to use their structures to derive new and innovative drugs. Indeed, we are in the era of computational molecular design, as applied to natural products. Predictive computational softwares have contributed to the discovery of molecular targets of natural products and their derivatives. In future the use of quantum computing, computational softwares and databases in modelling molecular interactions and predicting features and parameters needed for drug development, such as pharmacokinetic and pharmacodynamics, will result in few false positive leads in drug development. This review

  3. The web server of IBM's Bioinformatics and Pattern Discovery group.

    Science.gov (United States)

    Huynh, Tien; Rigoutsos, Isidore; Parida, Laxmi; Platt, Daniel; Shibuya, Tetsuo

    2003-07-01

    We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.

  4. Progress toward Gene Therapy for Duchenne Muscular Dystrophy.

    Science.gov (United States)

    Chamberlain, Joel R; Chamberlain, Jeffrey S

    2017-05-03

    Duchenne muscular dystrophy (DMD) has been a major target for gene therapy development for nearly 30 years. DMD is among the most common genetic diseases, and isolation of the defective gene (DMD, or dystrophin) was a landmark discovery, as it was the first time a human disease gene had been cloned without knowledge of the protein product. Despite tremendous obstacles, including the enormous size of the gene and the large volume of muscle tissue in the human body, efforts to devise a treatment based on gene replacement have advanced steadily through the combined efforts of dozens of labs and patient advocacy groups. Progress in the development of DMD gene therapy has been well documented in Molecular Therapy over the past 20 years and will be reviewed here to highlight prospects for success in the imminent human clinical trials planned by several groups. Copyright © 2017 The American Society of Gene and Cell Therapy. Published by Elsevier Inc. All rights reserved.

  5. Discovery of Antibiotics-derived Polymers for Gene Delivery using Combinatorial Synthesis and Cheminformatics Modeling

    Science.gov (United States)

    Potta, Thrimoorthy; Zhen, Zhuo; Grandhi, Taraka Sai Pavan; Christensen, Matthew D.; Ramos, James; Breneman, Curt M.; Rege, Kaushal

    2014-01-01

    We describe the combinatorial synthesis and cheminformatics modeling of aminoglycoside antibiotics-derived polymers for transgene delivery and expression. Fifty-six polymers were synthesized by polymerizing aminoglycosides with diglycidyl ether cross-linkers. Parallel screening resulted in identification of several lead polymers that resulted in high transgene expression levels in cells. The role of polymer physicochemical properties in determining efficacy of transgene expression was investigated using Quantitative Structure-Activity Relationship (QSAR) cheminformatics models based on Support Vector Regression (SVR) and ‘building block’ polymer structures. The QSAR model exhibited high predictive ability, and investigation of descriptors in the model, using molecular visualization and correlation plots, indicated that physicochemical attributes related to both, aminoglycosides and diglycidyl ethers facilitated transgene expression. This work synergistically combines combinatorial synthesis and parallel screening with cheminformatics-based QSAR models for discovery and physicochemical elucidation of effective antibiotics-derived polymers for transgene delivery in medicine and biotechnology. PMID:24331709

  6. From Protein Structure to Small-Molecules: Recent Advances and Applications to Fragment-Based Drug Discovery.

    Science.gov (United States)

    Ferreira, Leonardo G; Andricopulo, Adriano D

    2017-01-01

    Fragment-based drug discovery (FBDD) is a broadly used strategy in structure-guided ligand design, whereby low-molecular weight hits move from lead-like to drug-like compounds. Over the past 15 years, an increasingly important role of the integration of these strategies into industrial and academic research platforms has been successfully established, allowing outstanding contributions to drug discovery. One important factor for the current prominence of FBDD is the better coverage of the chemical space provided by fragment-like libraries. The development of the field relies on two features: (i) the growing number of structurally characterized drug targets and (ii) the enormous chemical diversity available for experimental and virtual screenings. Indeed, fragment-based campaigns have contributed to address major challenges in lead optimization, such as the appropriate physicochemical profile of clinical candidates. This perspective paper outlines the usefulness and applications of FBDD approaches in medicinal chemistry and drug design. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  7. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    Science.gov (United States)

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  8. Cogena, a novel tool for co-expressed gene-set enrichment analysis, applied to drug repositioning and drug mode of action discovery.

    Science.gov (United States)

    Jia, Zhilong; Liu, Ying; Guan, Naiyang; Bo, Xiaochen; Luo, Zhigang; Barnes, Michael R

    2016-05-27

    Drug repositioning, finding new indications for existing drugs, has gained much recent attention as a potentially efficient and economical strategy for accelerating new therapies into the clinic. Although improvement in the sensitivity of computational drug repositioning methods has identified numerous credible repositioning opportunities, few have been progressed. Arguably the "black box" nature of drug action in a new indication is one of the main blocks to progression, highlighting the need for methods that inform on the broader target mechanism in the disease context. We demonstrate that the analysis of co-expressed genes may be a critical first step towards illumination of both disease pathology and mode of drug action. We achieve this using a novel framework, co-expressed gene-set enrichment analysis (cogena) for co-expression analysis of gene expression signatures and gene set enrichment analysis of co-expressed genes. The cogena framework enables simultaneous, pathway driven, disease and drug repositioning analysis. Cogena can be used to illuminate coordinated changes within disease transcriptomes and identify drugs acting mechanistically within this framework. We illustrate this using a psoriatic skin transcriptome, as an exemplar, and recover two widely used Psoriasis drugs (Methotrexate and Ciclosporin) with distinct modes of action. Cogena out-performs the results of Connectivity Map and NFFinder webservers in similar disease transcriptome analyses. Furthermore, we investigated the literature support for the other top-ranked compounds to treat psoriasis and showed how the outputs of cogena analysis can contribute new insight to support the progression of drugs into the clinic. We have made cogena freely available within Bioconductor or https://github.com/zhilongjia/cogena . In conclusion, by targeting co-expressed genes within disease transcriptomes, cogena offers novel biological insight, which can be effectively harnessed for drug discovery and

  9. Leveraging the contribution of thermodynamics in drug discovery with the help of fluorescence-based thermal shift assays.

    Science.gov (United States)

    Hau, Jean Christophe; Fontana, Patrizia; Zimmermann, Catherine; De Pover, Alain; Erdmann, Dirk; Chène, Patrick

    2011-06-01

    The development of new drugs with better pharmacological and safety properties mandates the optimization of several parameters. Today, potency is often used as the sole biochemical parameter to identify and select new molecules. Surprisingly, thermodynamics, which is at the core of any interaction, is rarely used in drug discovery, even though it has been suggested that the selection of scaffolds according to thermodynamic criteria may be a valuable strategy. This poor integration of thermodynamics in drug discovery might be due to difficulties in implementing calorimetry experiments despite recent technological progress in this area. In this report, the authors show that fluorescence-based thermal shift assays could be used as prescreening methods to identify compounds with different thermodynamic profiles. This approach allows a reduction in the number of compounds to be tested in calorimetry experiments, thus favoring greater integration of thermodynamics in drug discovery.

  10. Discovery and Early Development of AVI-7537 and AVI-7288 for the Treatment of Ebola Virus and Marburg Virus Infections

    Directory of Open Access Journals (Sweden)

    Sina Bavari

    2012-11-01

    Full Text Available There are no currently approved treatments for filovirus infections. In this study we report the discovery process which led to the development of antisense Phosphorodiamidate Morpholino Oligomers (PMOs AVI-6002 (composed of AVI-7357 and AVI-7539 and AVI-6003 (composed of AVI-7287 and AVI-7288 targeting Ebola virus and Marburg virus respectively. The discovery process involved identification of optimal transcript binding sites for PMO based RNA-therapeutics followed by screening for effective viral gene target in mouse and guinea pig models utilizing adapted viral isolates. An evolution of chemical modifications were tested, beginning with simple Phosphorodiamidate Morpholino Oligomers (PMO transitioning to cell penetrating peptide conjugated PMOs (PPMO and ending with PMOplus containing a limited number of positively charged linkages in the PMO structure. The initial lead compounds were combinations of two agents targeting separate genes. In the final analysis, a single agent for treatment of each virus was selected, AVI-7537 targeting the VP24 gene of Ebola virus and AVI-7288 targeting NP of Marburg virus, and are now progressing into late stage clinical development as the optimal therapeutic candidates.

  11. Neural network-based QSAR and insecticide discovery: spinetoram

    Science.gov (United States)

    Sparks, Thomas C.; Crouse, Gary D.; Dripps, James E.; Anzeveno, Peter; Martynow, Jacek; DeAmicis, Carl V.; Gifford, James

    2008-06-01

    Improvements in the efficacy and spectrum of the spinosyns, novel fermentation derived insecticide, has long been a goal within Dow AgroSciences. As large and complex fermentation products identifying specific modifications to the spinosyns likely to result in improved activity was a difficult process, since most modifications decreased the activity. A variety of approaches were investigated to identify new synthetic directions for the spinosyn chemistry including several explorations of the quantitative structure activity relationships (QSAR) of spinosyns, which initially were unsuccessful. However, application of artificial neural networks (ANN) to the spinosyn QSAR problem identified new directions for improved activity in the chemistry, which subsequent synthesis and testing confirmed. The ANN-based analogs coupled with other information on substitution effects resulting from spinosyn structure activity relationships lead to the discovery of spinetoram (XDE-175). Launched in late 2007, spinetoram provides both improved efficacy and an expanded spectrum while maintaining the exceptional environmental and toxicological profile already established for the spinosyn chemistry.

  12. A Performance/Cost Evaluation for a GPU-Based Drug Discovery Application on Volunteer Computing

    Directory of Open Access Journals (Sweden)

    Ginés D. Guerrero

    2014-01-01

    Full Text Available Bioinformatics is an interdisciplinary research field that develops tools for the analysis of large biological databases, and, thus, the use of high performance computing (HPC platforms is mandatory for the generation of useful biological knowledge. The latest generation of graphics processing units (GPUs has democratized the use of HPC as they push desktop computers to cluster-level performance. Many applications within this field have been developed to leverage these powerful and low-cost architectures. However, these applications still need to scale to larger GPU-based systems to enable remarkable advances in the fields of healthcare, drug discovery, genome research, etc. The inclusion of GPUs in HPC systems exacerbates power and temperature issues, increasing the total cost of ownership (TCO. This paper explores the benefits of volunteer computing to scale bioinformatics applications as an alternative to own large GPU-based local infrastructures. We use as a benchmark a GPU-based drug discovery application called BINDSURF that their computational requirements go beyond a single desktop machine. Volunteer computing is presented as a cheap and valid HPC system for those bioinformatics applications that need to process huge amounts of data and where the response time is not a critical factor.

  13. Fragment-based discovery of potent inhibitors of the anti-apoptotic MCL-1 protein.

    Science.gov (United States)

    Petros, Andrew M; Swann, Steven L; Song, Danying; Swinger, Kerren; Park, Chang; Zhang, Haichao; Wendt, Michael D; Kunzer, Aaron R; Souers, Andrew J; Sun, Chaohong

    2014-03-15

    Apoptosis is regulated by the BCL-2 family of proteins, which is comprised of both pro-death and pro-survival members. Evasion of apoptosis is a hallmark of malignant cells. One way in which cancer cells achieve this evasion is thru overexpression of the pro-survival members of the BCL-2 family. Overexpression of MCL-1, a pro-survival protein, has been shown to be a resistance factor for Navitoclax, a potent inhibitor of BCL-2 and BCL-XL. Here we describe the use of fragment screening methods and structural biology to drive the discovery of novel MCL-1 inhibitors from two distinct structural classes. Specifically, cores derived from a biphenyl sulfonamide and salicylic acid were uncovered in an NMR-based fragment screen and elaborated using high throughput analog synthesis. This culminated in the discovery of selective and potent inhibitors of MCL-1 that may serve as promising leads for medicinal chemistry optimization efforts. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. Discovery of new enzymes and metabolic pathways using structure and genome context

    Science.gov (United States)

    Zhao, Suwen; Kumar, Ritesh; Sakai, Ayano; Vetting, Matthew W.; Wood, B. McKay; Brown, Shoshana; Bonanno, Jeffery B.; Hillerich, Brandan S.; Seidel, Ronald D.; Babbitt, Patricia C.; Almo, Steven C.; Sweedler, Jonathan V.; Gerlt, John A.; Cronan, John E.; Jacobson, Matthew P.

    2014-01-01

    Assigning valid functions to proteins identified in genome projects is challenging, with over-prediction and database annotation errors major concerns1. We, and others2, are developing computation-guided strategies for functional discovery using “metabolite docking” to experimentally derived3 or homology-based4 three-dimensional structures. Bacterial metabolic pathways often are encoded by “genome neighborhoods” (gene clusters and/or operons), which can provide important clues for functional assignment. We recently demonstrated the synergy of docking and pathway context by “predicting” the intermediates in the glycolytic pathway in E. coli5. Metabolite docking to multiple binding proteins/enzymes in the same pathway increases the reliability of in silico predictions of substrate specificities because the pathway intermediates are structurally similar. We report that structure-guided approaches for predicting the substrate specificities of several enzymes encoded by a bacterial gene cluster allowed i) the correct prediction of the in vitro activity of a structurally characterized enzyme of unknown function (PDB 2PMQ), 2-epimerization of trans-4-hydroxy-L-proline betaine (tHyp-B) and cis-4-hydroxy-D-proline betaine (cHyp-B), and ii) the correct identification of the catabolic pathway in which Hyp-B 2-epimerase participates. The substrate-liganded pose predicted by virtual library screening (docking) was confirmed experimentally. The enzymatic activities in the predicted pathway were confirmed by in vitro assays and genetic analyses; the intermediates were identified by metabolomics; and repression of the genes encoding the pathway by high salt was established by transcriptomics, confirming the osmolyte role of tHyp-B. This study establishes the utility of structure-guide functional predictions to enable the discovery of new metabolic pathways. PMID:24056934

  15. Context-sensitive service discovery experimental prototype and evaluation

    DEFF Research Database (Denmark)

    Balken, Robin; Haukrogh, Jesper; L. Jensen, Jens

    2007-01-01

    The amount of different networks and services available to users today are increasing. This introduces the need for a way to locate and sort out irrelevant services in the process of discovering available services to a user. This paper describes and evaluates a prototype of an automated discovery...... and selection system, which locates services relevant to a user, based on his/her context and the context of the available services. The prototype includes a multi-level, hierarchical system approach and the introduction of entities called User-nodes, Super-nodes and Root-nodes. These entities separate...... the network in domains that handle the complex distributed service discovery, which is based on dynamically changing context information. In the prototype, a method for performing context-sensitive service discovery has been realised. The service discovery part utilizes UPnP, which has been expanded in order...

  16. Binding thermodynamics discriminates fragments from druglike compounds: a thermodynamic description of fragment-based drug discovery.

    Science.gov (United States)

    Williams, Glyn; Ferenczy, György G; Ulander, Johan; Keserű, György M

    2017-04-01

    Small is beautiful - reducing the size and complexity of chemical starting points for drug design allows better sampling of chemical space, reveals the most energetically important interactions within protein-binding sites and can lead to improvements in the physicochemical properties of the final drug. The impact of fragment-based drug discovery (FBDD) on recent drug discovery projects and our improved knowledge of the structural and thermodynamic details of ligand binding has prompted us to explore the relationships between ligand-binding thermodynamics and FBDD. Information on binding thermodynamics can give insights into the contributions to protein-ligand interactions and could therefore be used to prioritise compounds with a high degree of specificity in forming key interactions. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. Paper-based synthetic gene networks.

    Science.gov (United States)

    Pardee, Keith; Green, Alexander A; Ferrante, Tom; Cameron, D Ewen; DaleyKeyser, Ajay; Yin, Peng; Collins, James J

    2014-11-06

    Synthetic gene networks have wide-ranging uses in reprogramming and rewiring organisms. To date, there has not been a way to harness the vast potential of these networks beyond the constraints of a laboratory or in vivo environment. Here, we present an in vitro paper-based platform that provides an alternate, versatile venue for synthetic biologists to operate and a much-needed medium for the safe deployment of engineered gene circuits beyond the lab. Commercially available cell-free systems are freeze dried onto paper, enabling the inexpensive, sterile, and abiotic distribution of synthetic-biology-based technologies for the clinic, global health, industry, research, and education. For field use, we create circuits with colorimetric outputs for detection by eye and fabricate a low-cost, electronic optical interface. We demonstrate this technology with small-molecule and RNA actuation of genetic switches, rapid prototyping of complex gene circuits, and programmable in vitro diagnostics, including glucose sensors and strain-specific Ebola virus sensors.

  18. Paper-based Synthetic Gene Networks

    Science.gov (United States)

    Pardee, Keith; Green, Alexander A.; Ferrante, Tom; Cameron, D. Ewen; DaleyKeyser, Ajay; Yin, Peng; Collins, James J.

    2014-01-01

    Synthetic gene networks have wide-ranging uses in reprogramming and rewiring organisms. To date, there has not been a way to harness the vast potential of these networks beyond the constraints of a laboratory or in vivo environment. Here, we present an in vitro paper-based platform that provides a new venue for synthetic biologists to operate, and a much-needed medium for the safe deployment of engineered gene circuits beyond the lab. Commercially available cell-free systems are freeze-dried onto paper, enabling the inexpensive, sterile and abiotic distribution of synthetic biology-based technologies for the clinic, global health, industry, research and education. For field use, we create circuits with colorimetric outputs for detection by eye, and fabricate a low-cost, electronic optical interface. We demonstrate this technology with small molecule and RNA actuation of genetic switches, rapid prototyping of complex gene circuits, and programmable in vitro diagnostics, including glucose sensors and strain-specific Ebola virus sensors. PMID:25417167

  19. Genes involved in immunity and apoptosis are associated with human presbycusis based on microarray analysis.

    Science.gov (United States)

    Dong, Yang; Li, Ming; Liu, Puzhao; Song, Haiyan; Zhao, Yuping; Shi, Jianrong

    2014-06-01

    Genes involved in immunity and apoptosis were associated with human presbycusis. CCR3 and GILZ played an important role in the pathogenesis of presbycusis, probably through regulating chemokine receptor, T-cell apoptosis, or T-cell activation pathways. To identify genes associated with human presbycusis and explore the molecular mechanism of presbycusis. Hearing function was tested by pure-tone audiometry. Microarray analysis was performed to identify presbycusis-correlated genes by Illumina Human-6 BeadChip using the peripheral blood samples of subjects. To identify biological process categories and pathways associated with presbycusis-correlated genes, bioinformatics analysis was carried out by Gene Ontology Tree Machine (GOTM) and database for annotation, visualization, and integrated discovery (DAVID). Quantitative RT-PCR (qRT-PCR) was used to validate the microarray data. Microarray analysis identified 469 up-regulated genes and 323 down-regulated genes. Both the dominant biological processes by Gene Ontology (GO) analysis and the enriched pathways by Kyoto encyclopedia of genes and genomes (KEGG) and BIOCARTA showed that genes involved in immunity and apoptosis were associated with presbycusis. In addition, CCR3, GILZ, CXCL10, and CX3CR1 genes showed consistent difference between groups for both the gene chip and qRT-PCR data. The differences of CCR3 and GILZ between presbycusis patients and controls were statistically significant (p < 0.05).

  20. Semantic based cluster content discovery in description first clustering algorithm

    International Nuclear Information System (INIS)

    Khan, M.W.; Asif, H.M.S.

    2017-01-01

    In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm) is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing); an IR (Information Retrieval) technique for induction of meaningful labels for clusters and VSM (Vector Space Model) for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase. (author)

  1. The application of mass-spectrometry-based protein biomarker discovery to theragnostics

    OpenAIRE

    Street, Jonathan M; Dear, James W

    2010-01-01

    Over the last decade rapid developments in mass spectrometry have allowed the identification of multiple proteins in complex biological samples. This proteomic approach has been applied to biomarker discovery in the context of clinical pharmacology (the combination of biomarker and drug now being termed ‘theragnostics’). In this review we provide a roadmap for early protein biomarker discovery studies, focusing on some key questions that regularly confront researchers.

  2. Topic model-based mass spectrometric data analysis in cancer biomarker discovery studies.

    Science.gov (United States)

    Wang, Minkun; Tsai, Tsung-Heng; Di Poto, Cristina; Ferrarini, Alessia; Yu, Guoqiang; Ressom, Habtom W

    2016-08-18

    A fundamental challenge in quantitation of biomolecules for cancer biomarker discovery is owing to the heterogeneous nature of human biospecimens. Although this issue has been a subject of discussion in cancer genomic studies, it has not yet been rigorously investigated in mass spectrometry based proteomic and metabolomic studies. Purification of mass spectometric data is highly desired prior to subsequent analysis, e.g., quantitative comparison of the abundance of biomolecules in biological samples. We investigated topic models to computationally analyze mass spectrometric data considering both integrated peak intensities and scan-level features, i.e., extracted ion chromatograms (EICs). Probabilistic generative models enable flexible representation in data structure and infer sample-specific pure resources. Scan-level modeling helps alleviate information loss during data preprocessing. We evaluated the capability of the proposed models in capturing mixture proportions of contaminants and cancer profiles on LC-MS based serum proteomic and GC-MS based tissue metabolomic datasets acquired from patients with hepatocellular carcinoma (HCC) and liver cirrhosis as well as synthetic data we generated based on the serum proteomic data. The results we obtained by analysis of the synthetic data demonstrated that both intensity-level and scan-level purification models can accurately infer the mixture proportions and the underlying true cancerous sources with small average error ratios (data, we found more proteins and metabolites with significant changes between HCC cases and cirrhotic controls. Candidate biomarkers selected after purification yielded biologically meaningful pathway analysis results and improved disease discrimination power in terms of the area under ROC curve compared to the results found prior to purification. We investigated topic model-based inference methods to computationally address the heterogeneity issue in samples analyzed by LC/GC-MS. We observed

  3. Construction and evaluation of normalized cDNA libraries enriched with full-length sequences for rapid discovery of new genes from Sisal (Agave sisalana Perr.) different developmental stages.

    Science.gov (United States)

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-10-12

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing.

  4. Gene Fusion Markup Language: a prototype for exchanging gene fusion data.

    Science.gov (United States)

    Kalyana-Sundaram, Shanker; Shanmugam, Achiraman; Chinnaiyan, Arul M

    2012-10-16

    An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future. Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at http://code.google.com/p/gfml-prototype/. The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses.

  5. Resource-estimation models and predicted discovery

    International Nuclear Information System (INIS)

    Hill, G.W.

    1982-01-01

    Resources have been estimated by predictive extrapolation from past discovery experience, by analogy with better explored regions, or by inference from evidence of depletion of targets for exploration. Changes in technology and new insights into geological mechanisms have occurred sufficiently often in the long run to form part of the pattern of mature discovery experience. The criterion, that a meaningful resource estimate needs an objective measure of its precision or degree of uncertainty, excludes 'estimates' based solely on expert opinion. This is illustrated by development of error measures for several persuasive models of discovery and production of oil and gas in USA, both annually and in terms of increasing exploration effort. Appropriate generalizations of the models resolve many points of controversy. This is illustrated using two USA data sets describing discovery of oil and of U 3 O 8 ; the latter set highlights an inadequacy of available official data. Review of the oil-discovery data set provides a warrant for adjusting the time-series prediction to a higher resource figure for USA petroleum. (author)

  6. Cell and small animal models for phenotypic drug discovery

    Directory of Open Access Journals (Sweden)

    Szabo M

    2017-06-01

    Full Text Available Mihaly Szabo,1 Sara Svensson Akusjärvi,1 Ankur Saxena,1 Jianping Liu,2 Gayathri Chandrasekar,1 Satish S Kitambi1 1Department of Microbiology Tumor, and Cell Biology, 2Department of Biochemistry and Biophysics, Karolinska Institutet, Solna, Sweden Abstract: The phenotype-based drug discovery (PDD approach is re-emerging as an alternative platform for drug discovery. This review provides an overview of the various model systems and technical advances in imaging and image analyses that strengthen the PDD platform. In PDD screens, compounds of therapeutic value are identified based on the phenotypic perturbations produced irrespective of target(s or mechanism of action. In this article, examples of phenotypic changes that can be detected and quantified with relative ease in a cell-based setup are discussed. In addition, a higher order of PDD screening setup using small animal models is also explored. As PDD screens integrate physiology and multiple signaling mechanisms during the screening process, the identified hits have higher biomedical applicability. Taken together, this review highlights the advantages gained by adopting a PDD approach in drug discovery. Such a PDD platform can complement target-based systems that are currently in practice to accelerate drug discovery. Keywords: phenotype, screening, PDD, discovery, zebrafish, drug

  7. Gene discovery from Jatropha curcas by sequencing of ESTs from normalized and full-length enriched cDNA library from developing seeds

    Directory of Open Access Journals (Sweden)

    Sugantham Priyanka Annabel

    2010-10-01

    Full Text Available Abstract Background Jatropha curcas L. is promoted as an important non-edible biodiesel crop worldwide. Jatropha oil, which is a triacylglycerol, can be directly blended with petro-diesel or transesterified with methanol and used as biodiesel. Genetic improvement in jatropha is needed to increase the seed yield, oil content, drought and pest resistance, and to modify oil composition so that it becomes a technically and economically preferred source for biodiesel production. However, genetic improvement efforts in jatropha could not take advantage of genetic engineering methods due to lack of cloned genes from this species. To overcome this hurdle, the current gene discovery project was initiated with an objective of isolating as many functional genes as possible from J. curcas by large scale sequencing of expressed sequence tags (ESTs. Results A normalized and full-length enriched cDNA library was constructed from developing seeds of J. curcas. The cDNA library contained about 1 × 106 clones and average insert size of the clones was 2.1 kb. Totally 12,084 ESTs were sequenced to average high quality read length of 576 bp. Contig analysis revealed 2258 contigs and 4751 singletons. Contig size ranged from 2-23 and there were 7333 ESTs in the contigs. This resulted in 7009 unigenes which were annotated by BLASTX. It showed 3982 unigenes with significant similarity to known genes and 2836 unigenes with significant similarity to genes of unknown, hypothetical and putative proteins. The remaining 191 unigenes which did not show similarity with any genes in the public database may encode for unique genes. Functional classification revealed unigenes related to broad range of cellular, molecular and biological functions. Among the 7009 unigenes, 6233 unigenes were identified to be potential full-length genes. Conclusions The high quality normalized cDNA library was constructed from developing seeds of J. curcas for the first time and 7009 unigenes coding

  8. Compositional descriptor-based recommender system for the materials discovery

    Science.gov (United States)

    Seko, Atsuto; Hayashi, Hiroyuki; Tanaka, Isao

    2018-06-01

    Structures and properties of many inorganic compounds have been collected historically. However, it only covers a very small portion of possible inorganic crystals, which implies the presence of numerous currently unknown compounds. A powerful machine-learning strategy is mandatory to discover new inorganic compounds from all chemical combinations. Herein we propose a descriptor-based recommender-system approach to estimate the relevance of chemical compositions where crystals can be formed [i.e., chemically relevant compositions (CRCs)]. In addition to data-driven compositional similarity used in the literature, the use of compositional descriptors as a prior knowledge is helpful for the discovery of new compounds. We validate our recommender systems in two ways. First, one database is used to construct a model, while another is used for the validation. Second, we estimate the phase stability for compounds at expected CRCs using density functional theory calculations.

  9. Network-based discovery through mechanistic systems biology. Implications for applications--SMEs and drug discovery: where the action is.

    Science.gov (United States)

    Benson, Neil

    2015-08-01

    Phase II attrition remains the most important challenge for drug discovery. Tackling the problem requires improved understanding of the complexity of disease biology. Systems biology approaches to this problem can, in principle, deliver this. This article reviews the reports of the application of mechanistic systems models to drug discovery questions and discusses the added value. Although we are on the journey to the virtual human, the length, path and rate of learning from this remain an open question. Success will be dependent on the will to invest and make the most of the insight generated along the way. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. History of the discovery of uranium fission by neutrons

    International Nuclear Information System (INIS)

    Simane, C.

    1989-01-01

    The history is briefly described of the discovery of uranium fission by neutrons, based on the texts of original scientific studies, memories or biographies of those who participated in the discovery and of their contemporaries. Obstacles that stood in the way of the discovery are discussed. It is stated that only a few scientists contributed to the discovery of uranium fission. The fission process itself still remains subject of physical research which studies its detailed laws. (Z.M.). 2 tabs., 16 refs

  11. A Smart Web-Based Geospatial Data Discovery System with Oceanographic Data as an Example

    Directory of Open Access Journals (Sweden)

    Yongyao Jiang

    2018-02-01

    Full Text Available Discovering and accessing geospatial data presents a significant challenge for the Earth sciences community as massive amounts of data are being produced on a daily basis. In this article, we report a smart web-based geospatial data discovery system that mines and utilizes data relevancy from metadata user behavior. Specifically, (1 the system enables semantic query expansion and suggestion to assist users in finding more relevant data; (2 machine-learned ranking is utilized to provide the optimal search ranking based on a number of identified ranking features that can reflect users’ search preferences; (3 a hybrid recommendation module is designed to allow users to discover related data considering metadata attributes and user behavior; (4 an integrated graphic user interface design is developed to quickly and intuitively guide data consumers to the appropriate data resources. As a proof of concept, we focus on a well-defined domain-oceanography and use oceanographic data discovery as an example. Experiments and a search example show that the proposed system can improve the scientific community’s data search experience by providing query expansion, suggestion, better search ranking, and data recommendation via a user-friendly interface.

  12. Human transporter database: comprehensive knowledge and discovery tools in the human transporter genes.

    Directory of Open Access Journals (Sweden)

    Adam Y Ye

    Full Text Available Transporters are essential in homeostatic exchange of endogenous and exogenous substances at the systematic, organic, cellular, and subcellular levels. Gene mutations of transporters are often related to pharmacogenetics traits. Recent developments in high throughput technologies on genomics, transcriptomics and proteomics allow in depth studies of transporter genes in normal cellular processes and diverse disease conditions. The flood of high throughput data have resulted in urgent need for an updated knowledgebase with curated, organized, and annotated human transporters in an easily accessible way. Using a pipeline with the combination of automated keywords query, sequence similarity search and manual curation on transporters, we collected 1,555 human non-redundant transporter genes to develop the Human Transporter Database (HTD (http://htd.cbi.pku.edu.cn. Based on the extensive annotations, global properties of the transporter genes were illustrated, such as expression patterns and polymorphisms in relationships with their ligands. We noted that the human transporters were enriched in many fundamental biological processes such as oxidative phosphorylation and cardiac muscle contraction, and significantly associated with Mendelian and complex diseases such as epilepsy and sudden infant death syndrome. Overall, HTD provides a well-organized interface to facilitate research communities to search detailed molecular and genetic information of transporters for development of personalized medicine.

  13. Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes

    KAUST Repository

    AlShahrani, Mona; Hoehndorf, Robert

    2018-01-01

    In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease's (or patient's) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse. Results: We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprising of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network.

  14. Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes

    KAUST Repository

    Alshahrani, Mona

    2018-04-30

    In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease\\'s (or patient\\'s) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse. Results: We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprising of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network.

  15. Discovery of a novel gene involved in autolysis of Clostridium cells.

    Science.gov (United States)

    Yang, Liejian; Bao, Guanhui; Zhu, Yan; Dong, Hongjun; Zhang, Yanping; Li, Yin

    2013-06-01

    Cell autolysis plays important physiological roles in the life cycle of clostridial cells. Understanding the genetic basis of the autolysis phenomenon of pathogenic Clostridium or solvent producing Clostridium cells might provide new insights into this important species. Genes that might be involved in autolysis of Clostridium acetobutylicum, a model clostridial species, were investigated in this study. Twelve putative autolysin genes were predicted in C. acetobutylicum DSM 1731 genome through bioinformatics analysis. Of these 12 genes, gene SMB_G3117 was selected for testing the in tracellular autolysin activity, growth profile, viable cell numbers, and cellular morphology. We found that overexpression of SMB_G3117 gene led to earlier ceased growth, significantly increased number of dead cells, and clear electrolucent cavities, while disruption of SMB_G3117 gene exhibited remarkably reduced intracellular autolysin activity. These results indicate that SMB_G3117 is a novel gene involved in cellular autolysis of C. acetobutylicum.

  16. Pharmacological screening technologies for venom peptide discovery.

    Science.gov (United States)

    Prashanth, Jutty Rajan; Hasaballah, Nojod; Vetter, Irina

    2017-12-01

    Venomous animals occupy one of the most successful evolutionary niches and occur on nearly every continent. They deliver venoms via biting and stinging apparatuses with the aim to rapidly incapacitate prey and deter predators. This has led to the evolution of venom components that act at a number of biological targets - including ion channels, G-protein coupled receptors, transporters and enzymes - with exquisite selectivity and potency, making venom-derived components attractive pharmacological tool compounds and drug leads. In recent years, plate-based pharmacological screening approaches have been introduced to accelerate venom-derived drug discovery. A range of assays are amenable to this purpose, including high-throughput electrophysiology, fluorescence-based functional and binding assays. However, despite these technological advances, the traditional activity-guided fractionation approach is time-consuming and resource-intensive. The combination of screening techniques suitable for miniaturization with sequence-based discovery approaches - supported by advanced proteomics, mass spectrometry, chromatography as well as synthesis and expression techniques - promises to further improve venom peptide discovery. Here, we discuss practical aspects of establishing a pipeline for venom peptide drug discovery with a particular emphasis on pharmacology and pharmacological screening approaches. This article is part of the Special Issue entitled 'Venom-derived Peptides as Pharmacological Tools.' Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Matrix- and tensor-based recommender systems for the discovery of currently unknown inorganic compounds

    Science.gov (United States)

    Seko, Atsuto; Hayashi, Hiroyuki; Kashima, Hisashi; Tanaka, Isao

    2018-01-01

    Chemically relevant compositions (CRCs) and atomic arrangements of inorganic compounds have been collected as inorganic crystal structure databases. Machine learning is a unique approach to search for currently unknown CRCs from vast candidates. Herein we propose matrix- and tensor-based recommender system approaches to predict currently unknown CRCs from database entries of CRCs. Firstly, the performance of the recommender system approaches to discover currently unknown CRCs is examined. A Tucker decomposition recommender system shows the best discovery rate of CRCs as the majority of the top 100 recommended ternary and quaternary compositions correspond to CRCs. Secondly, systematic density functional theory (DFT) calculations are performed to investigate the phase stability of the recommended compositions. The phase stability of the 27 compositions reveals that 23 currently unknown compounds are newly found to be stable. These results indicate that the recommender system has great potential to accelerate the discovery of new compounds.

  18. Alteration of plant meristem function by manipulation of the Retinoblastoma-like plant RRB gene

    Science.gov (United States)

    Durfee, Tim [Madison, WI; Feiler, Heidi [Albany, CA; Gruissem, Wilhelm [Forch, CH; Jenkins, Susan [Martinez, CA; Roe, Judith [Manhattan, KS; Zambryski, Patricia [Berkeley, CA

    2007-01-16

    This invention provides methods and compositions for altering the growth, organization, and differentiation of plant tissues. The invention is based on the discovery that, in plants, genetically altering the levels of Retinoblastoma-related gene (RRB) activity produces dramatic effects on the growth, proliferation, organization, and differentiation of plant meristem.

  19. Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection

    Science.gov (United States)

    Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd

    2015-02-01

    Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.

  20. An experimental evaluation of passage-based process discovery

    NARCIS (Netherlands)

    Verbeek, H.M.W.; Aalst, van der W.M.P.; La Rosa, M.; Soffer, P.

    2013-01-01

    In the area of process mining, the ILP Miner is known for the fact that it always returns a Petri net that perfectly fits a given event log. Like for most process discovery algorithms, its complexity is linear in the size of the event log and exponential in the number of event classes (i.e.,

  1. Hierarchical virtual screening approaches in small molecule drug discovery.

    Science.gov (United States)

    Kumar, Ashutosh; Zhang, Kam Y J

    2015-01-01

    Virtual screening has played a significant role in the discovery of small molecule inhibitors of therapeutic targets in last two decades. Various ligand and structure-based virtual screening approaches are employed to identify small molecule ligands for proteins of interest. These approaches are often combined in either hierarchical or parallel manner to take advantage of the strength and avoid the limitations associated with individual methods. Hierarchical combination of ligand and structure-based virtual screening approaches has received noteworthy success in numerous drug discovery campaigns. In hierarchical virtual screening, several filters using ligand and structure-based approaches are sequentially applied to reduce a large screening library to a number small enough for experimental testing. In this review, we focus on different hierarchical virtual screening strategies and their application in the discovery of small molecule modulators of important drug targets. Several virtual screening studies are discussed to demonstrate the successful application of hierarchical virtual screening in small molecule drug discovery. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. Using transcriptomics to guide lead optimization in drug discovery projects: Lessons learned from the QSTAR project.

    Science.gov (United States)

    Verbist, Bie; Klambauer, Günter; Vervoort, Liesbet; Talloen, Willem; Shkedy, Ziv; Thas, Olivier; Bender, Andreas; Göhlmann, Hinrich W H; Hochreiter, Sepp

    2015-05-01

    The pharmaceutical industry is faced with steadily declining R&D efficiency which results in fewer drugs reaching the market despite increased investment. A major cause for this low efficiency is the failure of drug candidates in late-stage development owing to safety issues or previously undiscovered side-effects. We analyzed to what extent gene expression data can help to de-risk drug development in early phases by detecting the biological effects of compounds across disease areas, targets and scaffolds. For eight drug discovery projects within a global pharmaceutical company, gene expression data were informative and able to support go/no-go decisions. Our studies show that gene expression profiling can detect adverse effects of compounds, and is a valuable tool in early-stage drug discovery decision making. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  3. Beyond information retrieval: information discovery and multimedia information retrieval

    OpenAIRE

    Roberto Raieli

    2017-01-01

    The paper compares the current methodologies for search and discovery of information and information resources: terminological search and term-based language, own of information retrieval (IR); semantic search and information discovery, being developed mainly through the language of linked data; semiotic search and content-based language, experienced by multimedia information retrieval (MIR).MIR semiotic methodology is, then, detailed.

  4. "Drug" Discovery with the Help of Organic Chemistry.

    Science.gov (United States)

    Itoh, Yukihiro; Suzuki, Takayoshi

    2017-01-01

    The first step in "drug" discovery is to find compounds binding to a potential drug target. In modern medicinal chemistry, the screening of a chemical library, structure-based drug design, and ligand-based drug design, or a combination of these methods, are generally used for identifying the desired compounds. However, they do not necessarily lead to success and there is no infallible method for drug discovery. Therefore, it is important to explore medicinal chemistry based on not only the conventional methods but also new ideas. So far, we have found various compounds as drug candidates. In these studies, some strategies based on organic chemistry have allowed us to find drug candidates, through 1) construction of a focused library using organic reactions and 2) rational design of enzyme inhibitors based on chemical reactions catalyzed by the target enzyme. Medicinal chemistry based on organic chemical reactions could be expected to supplement the conventional methods. In this review, we present drug discovery with the help of organic chemistry showing examples of our explorative studies on histone deacetylase inhibitors and lysine-specific demethylase 1 inhibitors.

  5. Characterization of Genes for Beef Marbling Based on Applying Gene Coexpression Network

    Directory of Open Access Journals (Sweden)

    Dajeong Lim

    2014-01-01

    Full Text Available Marbling is an important trait in characterization beef quality and a major factor for determining the price of beef in the Korean beef market. In particular, marbling is a complex trait and needs a system-level approach for identifying candidate genes related to the trait. To find the candidate gene associated with marbling, we used a weighted gene coexpression network analysis from the expression value of bovine genes. Hub genes were identified; they were topologically centered with large degree and BC values in the global network. We performed gene expression analysis to detect candidate genes in M. longissimus with divergent marbling phenotype (marbling scores 2 to 7 using qRT-PCR. The results demonstrate that transmembrane protein 60 (TMEM60 and dihydropyrimidine dehydrogenase (DPYD are associated with increasing marbling fat. We suggest that the network-based approach in livestock may be an important method for analyzing the complex effects of candidate genes associated with complex traits like marbling or tenderness.

  6. Transient transformation meets gene function discovery: the strawberry fruit case

    Directory of Open Access Journals (Sweden)

    Michela eGuidarelli

    2015-06-01

    Full Text Available Beside the well known nutritional and health benefits, strawberry (Fragaria X ananassa crop draws increasing attention as plant model system for the Rosaceae family, due to the short generation time, the rapid in vitro regeneration, and to the availability of the genome sequence of F. X ananassa and of the closely related F. vesca species. In the last years, the use of high-throughput sequence technologies provided large amounts of molecular information on the genes possibly related to several biological processes of this crop. Nevertheless, the function of most genes or gene products is still poorly understood and needs investigation. Transient transformation technology provides a powerful tool to study gene function in vivo, avoiding difficult drawbacks that typically affect the stable transformation protocols, such as transformation efficiency, transformants selection and regeneration. In this review we provide an overview of the use of transient expression in the investigation of the function of genes important for strawberry fruit development, defence and nutritional properties. The technical aspects related to an efficient use of this technique are described, and the possible impact and application in strawberry crop improvement are discussed.

  7. RNA Editing and Drug Discovery for Cancer Therapy

    Directory of Open Access Journals (Sweden)

    Wei-Hsuan Huang

    2013-01-01

    Full Text Available RNA editing is vital to provide the RNA and protein complexity to regulate the gene expression. Correct RNA editing maintains the cell function and organism development. Imbalance of the RNA editing machinery may lead to diseases and cancers. Recently, RNA editing has been recognized as a target for drug discovery although few studies targeting RNA editing for disease and cancer therapy were reported in the field of natural products. Therefore, RNA editing may be a potential target for therapeutic natural products. In this review, we provide a literature overview of the biological functions of RNA editing on gene expression, diseases, cancers, and drugs. The bioinformatics resources of RNA editing were also summarized.

  8. The Goal Specificity Effect on Strategy Use and Instructional Efficiency during Computer-Based Scientific Discovery Learning

    Science.gov (United States)

    Kunsting, Josef; Wirth, Joachim; Paas, Fred

    2011-01-01

    Using a computer-based scientific discovery learning environment on buoyancy in fluids we investigated the "effects of goal specificity" (nonspecific goals vs. specific goals) for two goal types (problem solving goals vs. learning goals) on "strategy use" and "instructional efficiency". Our empirical findings close an important research gap,…

  9. Mobile STEMship Discovery Center: K-12 Aerospace-Based Science, Technology, Engineering, and Mathematics (STEM) Mobile Teaching Vehicle

    Science.gov (United States)

    2015-08-03

    AND SUBTITLE Mobile STEMship Discovery Center: K-12 Aerospace-Based Science, Technology, Engineering, and Mathematics (STEM) Mobile Teaching Vehicle...Center program to be able to expose Science Technology, Engineering and Mathematics (STEM) space-inspired science centers for DC Metro beltway schools

  10. A Model-Based Joint Identification of Differentially Expressed Genes and Phenotype-Associated Genes

    Science.gov (United States)

    Seo, Minseok; Shin, Su-kyung; Kwon, Eun-Young; Kim, Sung-Eun; Bae, Yun-Jung; Lee, Seungyeoun; Sung, Mi-Kyung; Choi, Myung-Sook; Park, Taesung

    2016-01-01

    Over the last decade, many analytical methods and tools have been developed for microarray data. The detection of differentially expressed genes (DEGs) among different treatment groups is often a primary purpose of microarray data analysis. In addition, association studies investigating the relationship between genes and a phenotype of interest such as survival time are also popular in microarray data analysis. Phenotype association analysis provides a list of phenotype-associated genes (PAGs). However, it is sometimes necessary to identify genes that are both DEGs and PAGs. We consider the joint identification of DEGs and PAGs in microarray data analyses. The first approach we used was a naïve approach that detects DEGs and PAGs separately and then identifies the genes in an intersection of the list of PAGs and DEGs. The second approach we considered was a hierarchical approach that detects DEGs first and then chooses PAGs from among the DEGs or vice versa. In this study, we propose a new model-based approach for the joint identification of DEGs and PAGs. Unlike the previous two-step approaches, the proposed method identifies genes simultaneously that are DEGs and PAGs. This method uses standard regression models but adopts different null hypothesis from ordinary regression models, which allows us to perform joint identification in one-step. The proposed model-based methods were evaluated using experimental data and simulation studies. The proposed methods were used to analyze a microarray experiment in which the main interest lies in detecting genes that are both DEGs and PAGs, where DEGs are identified between two diet groups and PAGs are associated with four phenotypes reflecting the expression of leptin, adiponectin, insulin-like growth factor 1, and insulin. Model-based approaches provided a larger number of genes, which are both DEGs and PAGs, than other methods. Simulation studies showed that they have more power than other methods. Through analysis of

  11. A Model-Based Joint Identification of Differentially Expressed Genes and Phenotype-Associated Genes.

    Directory of Open Access Journals (Sweden)

    Samuel Sunghwan Cho

    Full Text Available Over the last decade, many analytical methods and tools have been developed for microarray data. The detection of differentially expressed genes (DEGs among different treatment groups is often a primary purpose of microarray data analysis. In addition, association studies investigating the relationship between genes and a phenotype of interest such as survival time are also popular in microarray data analysis. Phenotype association analysis provides a list of phenotype-associated genes (PAGs. However, it is sometimes necessary to identify genes that are both DEGs and PAGs. We consider the joint identification of DEGs and PAGs in microarray data analyses. The first approach we used was a naïve approach that detects DEGs and PAGs separately and then identifies the genes in an intersection of the list of PAGs and DEGs. The second approach we considered was a hierarchical approach that detects DEGs first and then chooses PAGs from among the DEGs or vice versa. In this study, we propose a new model-based approach for the joint identification of DEGs and PAGs. Unlike the previous two-step approaches, the proposed method identifies genes simultaneously that are DEGs and PAGs. This method uses standard regression models but adopts different null hypothesis from ordinary regression models, which allows us to perform joint identification in one-step. The proposed model-based methods were evaluated using experimental data and simulation studies. The proposed methods were used to analyze a microarray experiment in which the main interest lies in detecting genes that are both DEGs and PAGs, where DEGs are identified between two diet groups and PAGs are associated with four phenotypes reflecting the expression of leptin, adiponectin, insulin-like growth factor 1, and insulin. Model-based approaches provided a larger number of genes, which are both DEGs and PAGs, than other methods. Simulation studies showed that they have more power than other methods

  12. Genes2FANs: connecting genes through functional association networks

    Science.gov (United States)

    2012-01-01

    Background Protein-protein, cell signaling, metabolic, and transcriptional interaction networks are useful for identifying connections between lists of experimentally identified genes/proteins. However, besides physical or co-expression interactions there are many ways in which pairs of genes, or their protein products, can be associated. By systematically incorporating knowledge on shared properties of genes from diverse sources to build functional association networks (FANs), researchers may be able to identify additional functional interactions between groups of genes that are not readily apparent. Results Genes2FANs is a web based tool and a database that utilizes 14 carefully constructed FANs and a large-scale protein-protein interaction (PPI) network to build subnetworks that connect lists of human and mouse genes. The FANs are created from mammalian gene set libraries where mouse genes are converted to their human orthologs. The tool takes as input a list of human or mouse Entrez gene symbols to produce a subnetwork and a ranked list of intermediate genes that are used to connect the query input list. In addition, users can enter any PubMed search term and then the system automatically converts the returned results to gene lists using GeneRIF. This gene list is then used as input to generate a subnetwork from the user’s PubMed query. As a case study, we applied Genes2FANs to connect disease genes from 90 well-studied disorders. We find an inverse correlation between the counts of links connecting disease genes through PPI and links connecting diseases genes through FANs, separating diseases into two categories. Conclusions Genes2FANs is a useful tool for interpreting the relationships between gene/protein lists in the context of their various functions and networks. Combining functional association interactions with physical PPIs can be useful for revealing new biology and help form hypotheses for further experimentation. Our finding that disease genes in

  13. Computer-Aided Drug Discovery in Plant Pathology.

    Science.gov (United States)

    Shanmugam, Gnanendra; Jeon, Junhyun

    2017-12-01

    Control of plant diseases is largely dependent on use of agrochemicals. However, there are widening gaps between our knowledge on plant diseases gained from genetic/mechanistic studies and rapid translation of the knowledge into target-oriented development of effective agrochemicals. Here we propose that the time is ripe for computer-aided drug discovery/design (CADD) in molecular plant pathology. CADD has played a pivotal role in development of medically important molecules over the last three decades. Now, explosive increase in information on genome sequences and three dimensional structures of biological molecules, in combination with advances in computational and informational technologies, opens up exciting possibilities for application of CADD in discovery and development of agrochemicals. In this review, we outline two categories of the drug discovery strategies: structure- and ligand-based CADD, and relevant computational approaches that are being employed in modern drug discovery. In order to help readers to dive into CADD, we explain concepts of homology modelling, molecular docking, virtual screening, and de novo ligand design in structure-based CADD, and pharmacophore modelling, ligand-based virtual screening, quantitative structure activity relationship modelling and de novo ligand design for ligand-based CADD. We also provide the important resources available to carry out CADD. Finally, we present a case study showing how CADD approach can be implemented in reality for identification of potent chemical compounds against the important plant pathogens, Pseudomonas syringae and Colletotrichum gloeosporioides .

  14. Effective Online Group Discovery in Trajectory Databases

    DEFF Research Database (Denmark)

    Li, Xiaohui; Ceikute, Vaida; Jensen, Christian S.

    2013-01-01

    GPS-enabled devices are pervasive nowadays. Finding movement patterns in trajectory data stream is gaining in importance. We propose a group discovery framework that aims to efficiently support the online discovery of moving objects that travel together. The framework adopts a sampling-independen......GPS-enabled devices are pervasive nowadays. Finding movement patterns in trajectory data stream is gaining in importance. We propose a group discovery framework that aims to efficiently support the online discovery of moving objects that travel together. The framework adopts a sampling......-independent approach that makes no assumptions about when positions are sampled, gives no special importance to sampling points, and naturally supports the use of approximate trajectories. The framework's algorithms exploit state-of-the-art, density-based clustering (DBScan) to identify groups. The groups are scored...

  15. The biological knowledge discovery by PCCF measure and PCA-F projection.

    Science.gov (United States)

    Jia, Xingang; Zhu, Guanqun; Han, Qiuhong; Lu, Zuhong

    2017-01-01

    In the process of biological knowledge discovery, PCA is commonly used to complement the clustering analysis, but PCA typically gives the poor visualizations for most gene expression data sets. Here, we propose a PCCF measure, and use PCA-F to display clusters of PCCF, where PCCF and PCA-F are modeled from the modified cumulative probabilities of genes. From the analysis of simulated and experimental data sets, we demonstrate that PCCF is more appropriate and reliable for analyzing gene expression data compared to other commonly used distances or similarity measures, and PCA-F is a good visualization technique for identifying clusters of PCCF, where we aim at such data sets that the expression values of genes are collected at different time points.

  16. Development of gene diagnosis for diabetes and cholecystis based on gene analysis of CCK-A receptor

    International Nuclear Information System (INIS)

    Kono, Akira

    1998-01-01

    The gene structures of CCK, A type receptor in human, the rat and the mouse were investigated aiming to clarify that the aberration of the gene is involved in the incidences of diabetes and cholecystis. In this fiscal year, 1997, the normal structure of the gene and the accurate base sequence were analyzed using DNA fragments bound to 32 P-labelled cDNA of human CCKAR originated from the gene library of leucocyte. This gene contained about 2.2 x 10 5 base pairs and the base sequence was completely determined and registered to Japan DNA data bank (D85606). In addition, the genome structures and base sequences of mouse and rat CCKAR were analyzed and registered (D 85605 and D 50608, respectively). The differences in the base sequence of CCKAR among the species were found in the promotor region and the intron regions, suggesting that there might be differences in splicing among species. (M.N.)

  17. Microarray-Based Gene Expression Analysis for Veterinary Pathologists: A Review.

    Science.gov (United States)

    Raddatz, Barbara B; Spitzbarth, Ingo; Matheis, Katja A; Kalkuhl, Arno; Deschl, Ulrich; Baumgärtner, Wolfgang; Ulrich, Reiner

    2017-09-01

    High-throughput, genome-wide transcriptome analysis is now commonly used in all fields of life science research and is on the cusp of medical and veterinary diagnostic application. Transcriptomic methods such as microarrays and next-generation sequencing generate enormous amounts of data. The pathogenetic expertise acquired from understanding of general pathology provides veterinary pathologists with a profound background, which is essential in translating transcriptomic data into meaningful biological knowledge, thereby leading to a better understanding of underlying disease mechanisms. The scientific literature concerning high-throughput data-mining techniques usually addresses mathematicians or computer scientists as the target audience. In contrast, the present review provides the reader with a clear and systematic basis from a veterinary pathologist's perspective. Therefore, the aims are (1) to introduce the reader to the necessary methodological background; (2) to introduce the sequential steps commonly performed in a microarray analysis including quality control, annotation, normalization, selection of differentially expressed genes, clustering, gene ontology and pathway analysis, analysis of manually selected genes, and biomarker discovery; and (3) to provide references to publically available and user-friendly software suites. In summary, the data analysis methods presented within this review will enable veterinary pathologists to analyze high-throughput transcriptome data obtained from their own experiments, supplemental data that accompany scientific publications, or public repositories in order to obtain a more in-depth insight into underlying disease mechanisms.

  18. GSEH: A Novel Approach to Select Prostate Cancer-Associated Genes Using Gene Expression Heterogeneity.

    Science.gov (United States)

    Kim, Hyunjin; Choi, Sang-Min; Park, Sanghyun

    2018-01-01

    When a gene shows varying levels of expression among normal people but similar levels in disease patients or shows similar levels of expression among normal people but different levels in disease patients, we can assume that the gene is associated with the disease. By utilizing this gene expression heterogeneity, we can obtain additional information that abets discovery of disease-associated genes. In this study, we used collaborative filtering to calculate the degree of gene expression heterogeneity between classes and then scored the genes on the basis of the degree of gene expression heterogeneity to find "differentially predicted" genes. Through the proposed method, we discovered more prostate cancer-associated genes than 10 comparable methods. The genes prioritized by the proposed method are potentially significant to biological processes of a disease and can provide insight into them.

  19. Enhanced gene ranking approaches using modified trace ratio algorithm for gene expression data

    Directory of Open Access Journals (Sweden)

    Shruti Mishra

    Full Text Available Microarray technology enables the understanding and investigation of gene expression levels by analyzing high dimensional datasets that contain few samples. Over time, microarray expression data have been collected for studying the underlying biological mechanisms of disease. One such application for understanding the mechanism is by constructing a gene regulatory network (GRN. One of the foremost key criteria for GRN discovery is gene selection. Choosing a generous set of genes for the structure of the network is highly desirable. For this role, two suitable methods were proposed for selection of appropriate genes. The first approach comprises a gene selection method called Information gain, where the dataset is reformed and fused with another distinct algorithm called Trace Ratio (TR. Our second method is the implementation of our projected modified TR algorithm, where the scoring base for finding weight matrices has been re-designed. Both the methods' efficiency was shown with different classifiers that include variants of the Artificial Neural Network classifier, such as Resilient Propagation, Quick Propagation, Back Propagation, Manhattan Propagation and Radial Basis Function Neural Network and also the Support Vector Machine (SVM classifier. In the study, it was confirmed that both of the proposed methods worked well and offered high accuracy with a lesser number of iterations as compared to the original Trace Ratio algorithm. Keywords: Gene regulatory network, Gene selection, Information gain, Trace ratio, Canonical correlation analysis, Classification

  20. Cluster-based service discovery for heterogeneous wireless sensor networks

    NARCIS (Netherlands)

    Marin Perianu, Raluca; Scholten, Johan; Havinga, Paul J.M.; Hartel, Pieter H.

    2007-01-01

    We propose an energy-efficient service discovery protocol for heterogeneous wireless sensor networks. Our solution exploits a cluster overlay, where the clusterhead nodes form a distributed service registry. A service lookup results in visiting only the clusterhead nodes. We aim for minimizing the

  1. Rough – Granular Computing knowledge discovery models

    Directory of Open Access Journals (Sweden)

    Mohammed M. Eissa

    2016-11-01

    Full Text Available Medical domain has become one of the most important areas of research in order to richness huge amounts of medical information about the symptoms of diseases and how to distinguish between them to diagnose it correctly. Knowledge discovery models play vital role in refinement and mining of medical indicators to help medical experts to settle treatment decisions. This paper introduces four hybrid Rough – Granular Computing knowledge discovery models based on Rough Sets Theory, Artificial Neural Networks, Genetic Algorithm and Rough Mereology Theory. A comparative analysis of various knowledge discovery models that use different knowledge discovery techniques for data pre-processing, reduction, and data mining supports medical experts to extract the main medical indicators, to reduce the misdiagnosis rates and to improve decision-making for medical diagnosis and treatment. The proposed models utilized two medical datasets: Coronary Heart Disease dataset and Hepatitis C Virus dataset. The main purpose of this paper was to explore and evaluate the proposed models based on Granular Computing methodology for knowledge extraction according to different evaluation criteria for classification of medical datasets. Another purpose is to make enhancement in the frame of KDD processes for supervised learning using Granular Computing methodology.

  2. Twentieth Anniversary of Leptin discovery and the Approval of Myalept by FDA

    Directory of Open Access Journals (Sweden)

    Ata Mahmoodpoor

    2015-04-01

    Full Text Available Leptin is a 16 kDa hormone that is mainly expressed in adipose tissues (1. The major target of leptin is hypothalamus and it suppresses food intake and energy consumption, consequently diminishing adipose deposits and body weight (2, 3. The OB gene was isolated by Friedman in 1994 (4.  Based on the suggestion of Roger Guillemin, Friedman named this new hormone "leptin" from the Greek lepto meaning thin (5, 6. Since leptin discovery, numerous studies have been conducted on its physiological effects and its function in pathological conditions. Most of studies on leptin concentrated on its metabolic actions (7, receptors (8 and further broad functions such as immunity modulation (9 and memory processing (10. Considering such a vast range of functions, it is clear that patients with lack of leptin physiologically need pharmacological interventions. At this moment, we are in the twentieth year of leptin discovery. Finally, FDA approved a drug named Myalept (metreleptin for injection on February 2014 to treat rare metabolic disease caused by leptin deficiency. Congenital generalized lipodystrophy is a disorder with partial lack of fat tissues (11. The trial for the safety and effectiveness of Myalept demonstrated decrease in HbA1c, fasting blood glucose, and triglycerides (11. Nevertheless, there are some limitations to the usage of Myalept in HIV-related lipodystrophy and some metabolic disorders (11. Moreover, it may increase the risk of lymphoma by producing anti-metreleptin antibodies neutralizing endogenous leptin. Considering these concerns, Myalept is available only through a limited profile under a Risk Evaluation and Mitigation Strategy (REMS. Myalept is contraindicated in patients with general obesity not related to congenital leptin deficiency (12. Even though, Myalept has very limited indications for use in general population, it is considered a milestone towards the discovery of novel treatments for Leptin deficiencies and disorders

  3. Literature-based discovery of diabetes- and ROS-related targets

    Directory of Open Access Journals (Sweden)

    Pande Manjusha

    2010-10-01

    Full Text Available Abstract Background Reactive oxygen species (ROS are known mediators of cellular damage in multiple diseases including diabetic complications. Despite its importance, no comprehensive database is currently available for the genes associated with ROS. Methods We present ROS- and diabetes-related targets (genes/proteins collected from the biomedical literature through a text mining technology. A web-based literature mining tool, SciMiner, was applied to 1,154 biomedical papers indexed with diabetes and ROS by PubMed to identify relevant targets. Over-represented targets in the ROS-diabetes literature were obtained through comparisons against randomly selected literature. The expression levels of nine genes, selected from the top ranked ROS-diabetes set, were measured in the dorsal root ganglia (DRG of diabetic and non-diabetic DBA/2J mice in order to evaluate the biological relevance of literature-derived targets in the pathogenesis of diabetic neuropathy. Results SciMiner identified 1,026 ROS- and diabetes-related targets from the 1,154 biomedical papers (http://jdrf.neurology.med.umich.edu/ROSDiabetes/. Fifty-three targets were significantly over-represented in the ROS-diabetes literature compared to randomly selected literature. These over-represented targets included well-known members of the oxidative stress response including catalase, the NADPH oxidase family, and the superoxide dismutase family of proteins. Eight of the nine selected genes exhibited significant differential expression between diabetic and non-diabetic mice. For six genes, the direction of expression change in diabetes paralleled enhanced oxidative stress in the DRG. Conclusions Literature mining compiled ROS-diabetes related targets from the biomedical literature and led us to evaluate the biological relevance of selected targets in the pathogenesis of diabetic neuropathy.

  4. Computational Identification of Novel Genes: Current and Future Perspectives.

    Science.gov (United States)

    Klasberg, Steffen; Bitard-Feildel, Tristan; Mallet, Ludovic

    2016-01-01

    While it has long been thought that all genomic novelties are derived from the existing material, many genes lacking homology to known genes were found in recent genome projects. Some of these novel genes were proposed to have evolved de novo, ie, out of noncoding sequences, whereas some have been shown to follow a duplication and divergence process. Their discovery called for an extension of the historical hypotheses about gene origination. Besides the theoretical breakthrough, increasing evidence accumulated that novel genes play important roles in evolutionary processes, including adaptation and speciation events. Different techniques are available to identify genes and classify them as novel. Their classification as novel is usually based on their similarity to known genes, or lack thereof, detected by comparative genomics or against databases. Computational approaches are further prime methods that can be based on existing models or leveraging biological evidences from experiments. Identification of novel genes remains however a challenging task. With the constant software and technologies updates, no gold standard, and no available benchmark, evaluation and characterization of genomic novelty is a vibrant field. In this review, the classical and state-of-the-art tools for gene prediction are introduced. The current methods for novel gene detection are presented; the methodological strategies and their limits are discussed along with perspective approaches for further studies.

  5. Perbandingan antara Keefektifan Model Guided Discovery Learning dan Project-Based Learning pada Matakuliah Geometri

    Directory of Open Access Journals (Sweden)

    Okky Riswandha Imawan

    2015-12-01

    Abstract This research aims to describe the effectiveness and effectiveness differences of the Guided Discovery Learning (GDL Model and the Project Based Learning (PjBL Model in terms of achievement, self-confidence, and critical thinking skills of students on the Solid Geometry subjects. This research was quasi experimental. The research subjects were two undergraduate classes of Mathematics Education Program, Ahmad Dahlan University, in their second semester, established at random. The data analysis to test the effectiveness of the GDL and PjBL Models in terms of each of the dependent variables used the t-test. The data analysis to test differences between effectiveness of the GDL and that of the PjBL Model used the MANOVA test. The results of this research show that viewed from achievement, self confidence, and critical thinking skills of the students are the application of the GDL Model on Solid Geometry subject is effective, the application of the PjBL Model on Solid Geometry subject is effective, and there is no difference in the effectiveness of GDL and PjBL Models on Solid Geometry subject in terms of achievement, self confidence, and critical thinking skills of the students. Keywords: guided discovery learning model, project-based learning model, achievement, self-confidence, critical thinking skills

  6. PCR-based detection of gene transfer vectors: application to gene doping surveillance.

    Science.gov (United States)

    Perez, Irene C; Le Guiner, Caroline; Ni, Weiyi; Lyles, Jennifer; Moullier, Philippe; Snyder, Richard O

    2013-12-01

    Athletes who illicitly use drugs to enhance their athletic performance are at risk of being banned from sports competitions. Consequently, some athletes may seek new doping methods that they expect to be capable of circumventing detection. With advances in gene transfer vector design and therapeutic gene transfer, and demonstrations of safety and therapeutic benefit in humans, there is an increased probability of the pursuit of gene doping by athletes. In anticipation of the potential for gene doping, assays have been established to directly detect complementary DNA of genes that are top candidates for use in doping, as well as vector control elements. The development of molecular assays that are capable of exposing gene doping in sports can serve as a deterrent and may also identify athletes who have illicitly used gene transfer for performance enhancement. PCR-based methods to detect foreign DNA with high reliability, sensitivity, and specificity include TaqMan real-time PCR, nested PCR, and internal threshold control PCR.

  7. A cell-based fluorescent glucose transporter assay for SGLT2 inhibitor discovery

    Directory of Open Access Journals (Sweden)

    Yi Huan

    2013-04-01

    Full Text Available The sodium/glucose cotransporter 2 (SGLT2 is responsible for the majority of glucose reabsorption in the kidney, and currently, SGLT2 inhibitors are considered as promising hypoglycemic agents for the treatment of type 2 diabetes mellitus. By constructing CHO cell lines that stably express the human SGLT2 transmembrane protein, along with a fluorescent glucose transporter assay that uses 2-[N-(7-nitrobenz-2-oxa-1,3-diazol-4-ylamino]2-deoxyglucose (2-NBDG as a glucose analog, we have developed a nonradioactive, cell-based assay for the discovery and characterization of SGLT2 inhibitors.

  8. Discovery of seven novel Mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus.

    Science.gov (United States)

    Woo, Patrick C Y; Lau, Susanna K P; Lam, Carol S F; Lau, Candy C Y; Tsang, Alan K L; Lau, John H N; Bai, Ru; Teng, Jade L L; Tsang, Chris C C; Wang, Ming; Zheng, Bo-Jian; Chan, Kwok-Hung; Yuen, Kwok-Yung

    2012-04-01

    Recently, we reported the discovery of three novel coronaviruses, bulbul coronavirus HKU11, thrush coronavirus HKU12, and munia coronavirus HKU13, which were identified as representatives of a novel genus, Deltacoronavirus, in the subfamily Coronavirinae. In this territory-wide molecular epidemiology study involving 3,137 mammals and 3,298 birds, we discovered seven additional novel deltacoronaviruses in pigs and birds, which we named porcine coronavirus HKU15, white-eye coronavirus HKU16, sparrow coronavirus HKU17, magpie robin coronavirus HKU18, night heron coronavirus HKU19, wigeon coronavirus HKU20, and common moorhen coronavirus HKU21. Complete genome sequencing and comparative genome analysis showed that the avian and mammalian deltacoronaviruses have similar genome characteristics and structures. They all have relatively small genomes (25.421 to 26.674 kb), the smallest among all coronaviruses. They all have a single papain-like protease domain in the nsp3 gene; an accessory gene, NS6 open reading frame (ORF), located between the M and N genes; and a variable number of accessory genes (up to four) downstream of the N gene. Moreover, they all have the same putative transcription regulatory sequence of ACACCA. Molecular clock analysis showed that the most recent common ancestor of all coronaviruses was estimated at approximately 8100 BC, and those of Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus were at approximately 2400 BC, 3300 BC, 2800 BC, and 3000 BC, respectively. From our studies, it appears that bats and birds, the warm blooded flying vertebrates, are ideal hosts for the coronavirus gene source, bats for Alphacoronavirus and Betacoronavirus and birds for Gammacoronavirus and Deltacoronavirus, to fuel coronavirus evolution and dissemination.

  9. Bioinformatics Tools for the Discovery of New Nonribosomal Peptides

    DEFF Research Database (Denmark)

    Leclère, Valérie; Weber, Tilmann; Jacques, Philippe

    2016-01-01

    -dimensional structure of the peptides can be compared with the structural patterns of all known NRPs. The presented workflow leads to an efficient and rapid screening of genomic data generated by high throughput technologies. The exploration of such sequenced genomes may lead to the discovery of new drugs (i......This chapter helps in the use of bioinformatics tools relevant to the discovery of new nonribosomal peptides (NRPs) produced by microorganisms. The strategy described can be applied to draft or fully assembled genome sequences. It relies on the identification of the synthetase genes...... and the deciphering of the domain architecture of the nonribosomal peptide synthetases (NRPSs). In the next step, candidate peptides synthesized by these NRPSs are predicted in silico, considering the specificity of incorporated monomers together with their isomery. To assess their novelty, the two...

  10. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Directory of Open Access Journals (Sweden)

    Landfors Mattias

    2010-10-01

    background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data.

  11. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Science.gov (United States)

    2010-01-01

    preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082

  12. Computational medicinal chemistry in fragment-based drug discovery: what, how and when.

    Science.gov (United States)

    Rabal, Obdulia; Urbano-Cuadrado, Manuel; Oyarzabal, Julen

    2011-01-01

    The use of fragment-based drug discovery (FBDD) has increased in the last decade due to the encouraging results obtained to date. In this scenario, computational approaches, together with experimental information, play an important role to guide and speed up the process. By default, FBDD is generally considered as a constructive approach. However, such additive behavior is not always present, therefore, simple fragment maturation will not always deliver the expected results. In this review, computational approaches utilized in FBDD are reported together with real case studies, where applicability domains are exemplified, in order to analyze them, and then, maximize their performance and reliability. Thus, a proper use of these computational tools can minimize misleading conclusions, keeping the credit on FBDD strategy, as well as achieve higher impact in the drug-discovery process. FBDD goes one step beyond a simple constructive approach. A broad set of computational tools: docking, R group quantitative structure-activity relationship, fragmentation tools, fragments management tools, patents analysis and fragment-hopping, for example, can be utilized in FBDD, providing a clear positive impact if they are utilized in the proper scenario - what, how and when. An initial assessment of additive/non-additive behavior is a critical point to define the most convenient approach for fragments elaboration.

  13. Gene discovery and transcript analyses in the corn smut pathogen Ustilago maydis: expressed sequence tag and genome sequence comparison

    Directory of Open Access Journals (Sweden)

    Saville Barry J

    2007-09-01

    Full Text Available Abstract Background Ustilago maydis is the basidiomycete fungus responsible for common smut of corn and is a model organism for the study of fungal phytopathogenesis. To aid in the annotation of the genome sequence of this organism, several expressed sequence tag (EST libraries were generated from a variety of U. maydis cell types. In addition to utility in the context of gene identification and structure annotation, the ESTs were analyzed to identify differentially abundant transcripts and to detect evidence of alternative splicing and anti-sense transcription. Results Four cDNA libraries were constructed using RNA isolated from U. maydis diploid teliospores (U. maydis strains 518 × 521 and haploid cells of strain 521 grown under nutrient rich, carbon starved, and nitrogen starved conditions. Using the genome sequence as a scaffold, the 15,901 ESTs were assembled into 6,101 contiguous expressed sequences (contigs; among these, 5,482 corresponded to predicted genes in the MUMDB (MIPS Ustilago maydis database, while 619 aligned to regions of the genome not yet designated as genes in MUMDB. A comparison of EST abundance identified numerous genes that may be regulated in a cell type or starvation-specific manner. The transcriptional response to nitrogen starvation was assessed using RT-qPCR. The results of this suggest that there may be cross-talk between the nitrogen and carbon signalling pathways in U. maydis. Bioinformatic analysis identified numerous examples of alternative splicing and anti-sense transcription. While intron retention was the predominant form of alternative splicing in U. maydis, other varieties were also evident (e.g. exon skipping. Selected instances of both alternative splicing and anti-sense transcription were independently confirmed using RT-PCR. Conclusion Through this work: 1 substantial sequence information has been provided for U. maydis genome annotation; 2 new genes were identified through the discovery of 619

  14. Development of gene diagnosis for diabetes and cholecystitis based on gene analysis of CCK-A receptor

    International Nuclear Information System (INIS)

    Kono, Akira

    1999-01-01

    Base sequence analysis of CCKAR gene (a gene of A-type receptor for cholecystokinin) from OLETF rat, a model rat for insulin-independent diabetes was made based on the base sequence of wild CCKAR gene, which had been clarified in the previous year. From the pancreas of OLETF rat, DNA was extracted and transduced into λphage after fragmentation to construct the gene library of OLETF. Then, λphage DNA clone bound with labelled cDNA of CCKAR gene was analyzed and the gene structure was compared with that of the wild gene. It was demonstrated that CCKAR gene of OLETF had a deletion (6800 b.p.) ranging from the promoter region to the Exon 2, suggesting that CCKAR gene is not functional in OLETF rat. The whole sequence of this mutant gene was registered into Japan DNA Bank (D 50610). Then, F 2 offspring rats were obtained through crossing OLETF (female) and F344 (male) and the time course-changes in the blood glucose level after glucose loading were compared among them. The blood glucose level after glucose loading was significantly higher in the homo-mutant F 2 (CCKAR,-/-) as well as the parent OLETF rat than hetero-mutant F 2 (CCKARm-/+) or the wild rat (CCKAR,+/+). This suggests that CCKAR gene might be involved in the control of blood glucose level and an alteration of the expression level or the functions of CCKAR gene might affect the blood glucose level. (M.N.)

  15. The future of drug discovery: enabling technologies for enhancing lead characterization and profiling therapeutic potential.

    Science.gov (United States)

    Janero, David R

    2014-08-01

    Technology often serves as a handmaiden and catalyst of invention. The discovery of safe, effective medications depends critically upon experimental approaches capable of providing high-impact information on the biological effects of drug candidates early in the discovery pipeline. This information can enable reliable lead identification, pharmacological compound differentiation and successful translation of research output into clinically useful therapeutics. The shallow preclinical profiling of candidate compounds promulgates a minimalistic understanding of their biological effects and undermines the level of value creation necessary for finding quality leads worth moving forward within the development pipeline with efficiency and prognostic reliability sufficient to help remediate the current pharma-industry productivity drought. Three specific technologies discussed herein, in addition to experimental areas intimately associated with contemporary drug discovery, appear to hold particular promise for strengthening the preclinical valuation of drug candidates by deepening lead characterization. These are: i) hydrogen-deuterium exchange mass spectrometry for characterizing structural and ligand-interaction dynamics of disease-relevant proteins; ii) activity-based chemoproteomics for profiling the functional diversity of mammalian proteomes; and iii) nuclease-mediated precision gene editing for developing more translatable cellular and in vivo models of human diseases. When applied in an informed manner congruent with the clinical understanding of disease processes, technologies such as these that span levels of biological organization can serve as valuable enablers of drug discovery and potentially contribute to reducing the current, unacceptably high rates of compound clinical failure.

  16. Gene Regulation, Modulation, and Their Applications in Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    Mario Flores

    2013-01-01

    Full Text Available Common microarray and next-generation sequencing data analysis concentrate on tumor subtype classification, marker detection, and transcriptional regulation discovery during biological processes by exploring the correlated gene expression patterns and their shared functions. Genetic regulatory network (GRN based approaches have been employed in many large studies in order to scrutinize for dysregulation and potential treatment controls. In addition to gene regulation and network construction, the concept of the network modulator that has significant systemic impact has been proposed, and detection algorithms have been developed in past years. Here we provide a unified mathematic description of these methods, followed with a brief survey of these modulator identification algorithms. As an early attempt to extend the concept to new RNA regulation mechanism, competitive endogenous RNA (ceRNA, into a modulator framework, we provide two applications to illustrate the network construction, modulation effect, and the preliminary finding from these networks. Those methods we surveyed and developed are used to dissect the regulated network under different modulators. Not limit to these, the concept of “modulation” can adapt to various biological mechanisms to discover the novel gene regulation mechanisms.

  17. 14 CFR 406.143 - Discovery.

    Science.gov (United States)

    2010-01-01

    ... 14 Aeronautics and Space 4 2010-01-01 2010-01-01 false Discovery. 406.143 Section 406.143... Transportation Adjudications § 406.143 Discovery. (a) Initiation of discovery. Any party may initiate discovery... after a complaint has been filed. (b) Methods of discovery. The following methods of discovery are...

  18. The fragile x mental retardation syndrome 20 years after the FMR1 gene discovery: an expanding universe of knowledge.

    Science.gov (United States)

    Rousseau, François; Labelle, Yves; Bussières, Johanne; Lindsay, Carmen

    2011-08-01

    The fragile X mental retardation (FXMR) syndrome is one of the most frequent causes of mental retardation. Affected individuals display a wide range of additional characteristic features including behavioural and physical phenotypes, and the extent to which individuals are affected is highly variable. For these reasons, elucidation of the pathophysiology of this disease has been an important challenge to the scientific community. 1991 marks the year of the discovery of both the FMR1 gene mutations involved in this disease, and of their dynamic nature. Although a mouse model for the disease has been available for 16 years and extensive research has been performed on the FMR1 protein (FMRP), we still understand little about how the disease develops, and no treatment has yet been shown to be effective. In this review, we summarise current knowledge on FXMR with an emphasis on the technical challenges of molecular diagnostics, on its prevalence and dynamics among populations, and on the potential of screening for FMR1 mutations.

  19. The Fragile X Mental Retardation Syndrome 20 Years After the FMR1 Gene Discovery: an Expanding Universe of Knowledge

    Science.gov (United States)

    Rousseau, François; Labelle, Yves; Bussières, Johanne; Lindsay, Carmen

    2011-01-01

    The fragile X mental retardation (FXMR) syndrome is one of the most frequent causes of mental retardation. Affected individuals display a wide range of additional characteristic features including behavioural and physical phenotypes, and the extent to which individuals are affected is highly variable. For these reasons, elucidation of the pathophysiology of this disease has been an important challenge to the scientific community. 1991 marks the year of the discovery of both the FMR1 gene mutations involved in this disease, and of their dynamic nature. Although a mouse model for the disease has been available for 16 years and extensive research has been performed on the FMR1 protein (FMRP), we still understand little about how the disease develops, and no treatment has yet been shown to be effective. In this review, we summarise current knowledge on FXMR with an emphasis on the technical challenges of molecular diagnostics, on its prevalence and dynamics among populations, and on the potential of screening for FMR1 mutations. PMID:21912443

  20. Validity and Practitality of Acid-Base Module Based on Guided Discovery Learning for Senior High School

    Science.gov (United States)

    Yerimadesi; Bayharti; Jannah, S. M.; Lufri; Festiyed; Kiram, Y.

    2018-04-01

    This Research and Development(R&D) aims to produce guided discovery learning based module on topic of acid-base and determine its validity and practicality in learning. Module development used Four D (4-D) model (define, design, develop and disseminate).This research was performed until development stage. Research’s instruments were validity and practicality questionnaires. Module was validated by five experts (three chemistry lecturers of Universitas Negeri Padang and two chemistry teachers of SMAN 9 Padang). Practicality test was done by two chemistry teachers and 30 students of SMAN 9 Padang. Kappa Cohen’s was used to analyze validity and practicality. The average moment kappa was 0.86 for validity and those for practicality were 0.85 by teachers and 0.76 by students revealing high category. It can be concluded that validity and practicality was proven for high school chemistry learning.

  1. UPLC/Q-TOF MS-based metabolomics and qRT-PCR in enzyme gene screening with key role in triterpenoid saponin biosynthesis of Polygala tenuifolia.

    Science.gov (United States)

    Zhang, Fusheng; Li, Xiaowei; Li, Zhenyu; Xu, Xiaoshuang; Peng, Bing; Qin, Xuemei; Du, Guanhua

    2014-01-01

    The dried root of Polygala tenuifolia, named Radix Polygalae, is a well-known traditional Chinese medicine. Triterpenoid saponins are some of the most important components of Radix Polygalae extracts and are widely studied because of their valuable pharmacological properties. However, the relationship between gene expression and triterpenoid saponin biosynthesis in P. tenuifolia is unclear. In this study, ultra-performance liquid chromatography (UPLC) coupled with quadrupole time-of-flight mass spectrometry (Q-TOF MS)-based metabolomic analysis was performed to identify and quantify the different chemical constituents of the roots, stems, leaves, and seeds of P. tenuifolia. A total of 22 marker compounds (VIP>1) were explored, and significant differences in all 7 triterpenoid saponins among the different tissues were found. We also observed an efficient reference gene GAPDH for different tissues in this plant and determined the expression level of some genes in the triterpenoid saponin biosynthetic pathway. Results showed that MVA pathway has more important functions in the triterpenoid saponin biosynthesis of P. tenuifolia. The expression levels of squalene synthase (SQS), squalene monooxygenase (SQE), and beta-amyrin synthase (β-AS) were highly correlated with the peak area intensity of triterpenoid saponins compared with data from UPLC/Q-TOF MS-based metabolomic analysis. This finding suggested that a combination of UPLC/Q-TOF MS-based metabolomics and gene expression analysis can effectively elucidate the mechanism of triterpenoid saponin biosynthesis and can provide useful information on gene discovery. These findings can serve as a reference for using the overexpression of genes encoding for SQS, SQE, and/or β-AS to increase the triterpenoid saponin production of P. tenuifolia.

  2. UPLC/Q-TOF MS-based metabolomics and qRT-PCR in enzyme gene screening with key role in triterpenoid saponin biosynthesis of Polygala tenuifolia.

    Directory of Open Access Journals (Sweden)

    Fusheng Zhang

    Full Text Available The dried root of Polygala tenuifolia, named Radix Polygalae, is a well-known traditional Chinese medicine. Triterpenoid saponins are some of the most important components of Radix Polygalae extracts and are widely studied because of their valuable pharmacological properties. However, the relationship between gene expression and triterpenoid saponin biosynthesis in P. tenuifolia is unclear.In this study, ultra-performance liquid chromatography (UPLC coupled with quadrupole time-of-flight mass spectrometry (Q-TOF MS-based metabolomic analysis was performed to identify and quantify the different chemical constituents of the roots, stems, leaves, and seeds of P. tenuifolia. A total of 22 marker compounds (VIP>1 were explored, and significant differences in all 7 triterpenoid saponins among the different tissues were found. We also observed an efficient reference gene GAPDH for different tissues in this plant and determined the expression level of some genes in the triterpenoid saponin biosynthetic pathway. Results showed that MVA pathway has more important functions in the triterpenoid saponin biosynthesis of P. tenuifolia. The expression levels of squalene synthase (SQS, squalene monooxygenase (SQE, and beta-amyrin synthase (β-AS were highly correlated with the peak area intensity of triterpenoid saponins compared with data from UPLC/Q-TOF MS-based metabolomic analysis.This finding suggested that a combination of UPLC/Q-TOF MS-based metabolomics and gene expression analysis can effectively elucidate the mechanism of triterpenoid saponin biosynthesis and can provide useful information on gene discovery. These findings can serve as a reference for using the overexpression of genes encoding for SQS, SQE, and/or β-AS to increase the triterpenoid saponin production of P. tenuifolia.

  3. Annotating gene sets by mining large literature collections with protein networks.

    Science.gov (United States)

    Wang, Sheng; Ma, Jianzhu; Yu, Michael Ku; Zheng, Fan; Huang, Edward W; Han, Jiawei; Peng, Jian; Ideker, Trey

    2018-01-01

    Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.

  4. Higgs Discovery

    DEFF Research Database (Denmark)

    Sannino, Francesco

    2013-01-01

    has been challenged by the discovery of a not-so-heavy Higgs-like state. I will therefore review the recent discovery \\cite{Foadi:2012bb} that the standard model top-induced radiative corrections naturally reduce the intrinsic non-perturbative mass of the composite Higgs state towards the desired...... via first principle lattice simulations with encouraging results. The new findings show that the recent naive claims made about new strong dynamics at the electroweak scale being disfavoured by the discovery of a not-so-heavy composite Higgs are unwarranted. I will then introduce the more speculative......I discuss the impact of the discovery of a Higgs-like state on composite dynamics starting by critically examining the reasons in favour of either an elementary or composite nature of this state. Accepting the standard model interpretation I re-address the standard model vacuum stability within...

  5. HANDS: a tool for genome-wide discovery of subgenome-specific base-identity in polyploids.

    KAUST Repository

    Mithani, Aziz; Belfield, Eric J; Brown, Carly; Jiang, Caifu; Leach, Lindsey J; Harberd, Nicholas P

    2013-01-01

    The analysis of polyploid genomes is problematic because homeologous subgenome sequences are closely related. This relatedness makes it difficult to assign individual sequences to the specific subgenome from which they are derived, and hinders the development of polyploid whole genome assemblies.We here present a next-generation sequencing (NGS)-based approach for assignment of subgenome-specific base-identity at sites containing homeolog-specific polymorphisms (HSPs): 'HSP base Assignment using NGS data through Diploid Similarity' (HANDS). We show that HANDS correctly predicts subgenome-specific base-identity at >90% of assayed HSPs in the hexaploid bread wheat (Triticum aestivum) transcriptome, thus providing a substantial increase in accuracy versus previous methods for homeolog-specific base assignment.We conclude that HANDS enables rapid and accurate genome-wide discovery of homeolog-specific base-identity, a capability having multiple applications in polyploid genomics.

  6. HANDS: a tool for genome-wide discovery of subgenome-specific base-identity in polyploids.

    KAUST Repository

    Mithani, Aziz

    2013-09-24

    The analysis of polyploid genomes is problematic because homeologous subgenome sequences are closely related. This relatedness makes it difficult to assign individual sequences to the specific subgenome from which they are derived, and hinders the development of polyploid whole genome assemblies.We here present a next-generation sequencing (NGS)-based approach for assignment of subgenome-specific base-identity at sites containing homeolog-specific polymorphisms (HSPs): \\'HSP base Assignment using NGS data through Diploid Similarity\\' (HANDS). We show that HANDS correctly predicts subgenome-specific base-identity at >90% of assayed HSPs in the hexaploid bread wheat (Triticum aestivum) transcriptome, thus providing a substantial increase in accuracy versus previous methods for homeolog-specific base assignment.We conclude that HANDS enables rapid and accurate genome-wide discovery of homeolog-specific base-identity, a capability having multiple applications in polyploid genomics.

  7. Bioluminescent bacteria: lux genes as environmental biosensors

    OpenAIRE

    Nunes-Halldorson,Vânia da Silva; Duran,Norma Letícia

    2003-01-01

    Bioluminescent bacteria are widespread in natural environments. Over the years, many researchers have been studying the physiology, biochemistry and genetic control of bacterial bioluminescence. These discoveries have revolutionized the area of Environmental Microbiology through the use of luminescent genes as biosensors for environmental studies. This paper will review the chronology of scientific discoveries on bacterial bioluminescence and the current applications of bioluminescence in env...

  8. Student Responses Toward Student Worksheets Based on Discovery Learning for Students with Intrapersonal and Interpersonal Intelligence

    Science.gov (United States)

    Yerizon, Y.; Putra, A. A.; Subhan, M.

    2018-04-01

    Students have a low mathematical ability because they are used to learning to hear the teacher's explanation. For that students are given activities to sharpen his ability in math. One way to do that is to create discovery learning based work sheet. The development of this worksheet took into account specific student learning styles including in schools that have classified students based on multiple intelligences. The dominant learning styles in the classroom were intrapersonal and interpersonal. The purpose of this study was to discover students’ responses to the mathematics work sheets of the junior high school with a discovery learning approach suitable for students with Intrapersonal and Interpersonal Intelligence. This tool was developed using a development model adapted from the Plomp model. The development process of this tools consists of 3 phases: front-end analysis/preliminary research, development/prototype phase and assessment phase. From the results of the research, it is found that students have good response to the resulting work sheet. The worksheet was understood well by students and its helps student in understanding the concept learned.

  9. A PLSPM-Based Test Statistic for Detecting Gene-Gene Co-Association in Genome-Wide Association Study with Case-Control Design

    Science.gov (United States)

    Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang; Liu, Yanxun; Li, Fangyu; Peng, Bin; Zhu, Dianwen; Zhao, Jinghua; Xue, Fuzhong

    2013-01-01

    For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods. PMID:23620809

  10. Genome-Based Studies of Marine Microorganisms to Maximize the Diversity of Natural Products Discovery for Medical Treatments

    Directory of Open Access Journals (Sweden)

    Xin-Qing Zhao

    2011-01-01

    Full Text Available Marine microorganisms are rich source for natural products which play important roles in pharmaceutical industry. Over the past decade, genome-based studies of marine microorganisms have unveiled the tremendous diversity of the producers of natural products and also contributed to the efficiency of harness the strain diversity and chemical diversity, as well as the genetic diversity of marine microorganisms for the rapid discovery and generation of new natural products. In the meantime, genomic information retrieved from marine symbiotic microorganisms can also be employed for the discovery of new medical molecules from yet-unculturable microorganisms. In this paper, the recent progress in the genomic research of marine microorganisms is reviewed; new tools of genome mining as well as the advance in the activation of orphan pathways and metagenomic studies are summarized. Genome-based research of marine microorganisms will maximize the biodiscovery process and solve the problems of supply and sustainability of drug molecules for medical treatments.

  11. Exploring the role of peptides in polymer-based gene delivery.

    Science.gov (United States)

    Sun, Yanping; Yang, Zhen; Wang, Chunxi; Yang, Tianzhi; Cai, Cuifang; Zhao, Xiaoyun; Yang, Li; Ding, Pingtian

    2017-09-15

    Polymers are widely studied as non-viral gene vectors because of their strong DNA binding ability, capacity to carry large payload, flexibility of chemical modifications, low immunogenicity, and facile processes for manufacturing. However, high cytotoxicity and low transfection efficiency substantially restrict their application in clinical trials. Incorporating functional peptides is a promising approach to address these issues. Peptides demonstrate various functions in polymer-based gene delivery systems, such as targeting to specific cells, breaching membrane barriers, facilitating DNA condensation and release, and lowering cytotoxicity. In this review, we systematically summarize the role of peptides in polymer-based gene delivery, and elaborate how to rationally design polymer-peptide based gene delivery vectors. Polymers are widely studied as non-viral gene vectors, but suffer from high cytotoxicity and low transfection efficiency. Incorporating short, bioactive peptides into polymer-based gene delivery systems can address this issue. Peptides demonstrate various functions in polymer-based gene delivery systems, such as targeting to specific cells, breaching membrane barriers, facilitating DNA condensation and release, and lowering cytotoxicity. In this review, we highlight the peptides' roles in polymer-based gene delivery, and elaborate how to utilize various functional peptides to enhance the transfection efficiency of polymers. The optimized peptide-polymer vectors should be able to alter their structures and functions according to biological microenvironments and utilize inherent intracellular pathways of cells, and consequently overcome the barriers during gene delivery to enhance transfection efficiency. Copyright © 2017 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.

  12. Analytical strategies for discovery and replication of genetic effects in pharmacogenomic studies

    Directory of Open Access Journals (Sweden)

    Kohler JR

    2014-08-01

    Full Text Available Jared R Kohler, Tobias Guennel, Scott L MarshallBioStat Solutions, Inc., Frederick, MD, USAAbstract: In the past decade, the pharmaceutical industry and biomedical research sector have devoted considerable resources to pharmacogenomics (PGx with the hope that understanding genetic variation in patients would deliver on the promise of personalized medicine. With the advent of new technologies and the improved collection of DNA samples, the roadblock to advancements in PGx discovery is no longer the lack of high-density genetic information captured on patient populations, but rather the development, adaptation, and tailoring of analytical strategies to effectively harness this wealth of information. The current analytical paradigm in PGx considers the single-nucleotide polymorphism (SNP as the genomic feature of interest and performs single SNP association tests to discover PGx effects – ie, genetic effects impacting drug response. While it can be straightforward to process single SNP results and to consider how this information may be extended for use in downstream patient stratification, the rate of replication for single SNP associations has been low and the desired success of producing clinically and commercially viable biomarkers has not been realized. This may be due to the fact that single SNP association testing is suboptimal given the complexities of PGx discovery in the clinical trial setting, including: 1 relatively small sample sizes; 2 diverse clinical cohorts within and across trials due to genetic ancestry (potentially impacting the ability to replicate findings; and 3 the potential polygenic nature of a drug response. Subsequently, a shift in the current paradigm is proposed: to consider the gene as the genomic feature of interest in PGx discovery. The proof-of-concept study presented in this manuscript demonstrates that genomic region-based association testing has the potential to improve the power of detecting single SNP or

  13. Structure-based drug discovery for combating influenza virus by targeting the PA-PB1 interaction.

    Science.gov (United States)

    Watanabe, Ken; Ishikawa, Takeshi; Otaki, Hiroki; Mizuta, Satoshi; Hamada, Tsuyoshi; Nakagaki, Takehiro; Ishibashi, Daisuke; Urata, Shuzo; Yasuda, Jiro; Tanaka, Yoshimasa; Nishida, Noriyuki

    2017-08-25

    Influenza virus infections are serious public health concerns throughout the world. The development of compounds with novel mechanisms of action is urgently required due to the emergence of viruses with resistance to the currently-approved anti-influenza viral drugs. We performed in silico screening using a structure-based drug discovery algorithm called Nagasaki University Docking Engine (NUDE), which is optimised for a GPU-based supercomputer (DEstination for Gpu Intensive MAchine; DEGIMA), by targeting influenza viral PA protein. The compounds selected by NUDE were tested for anti-influenza virus activity using a cell-based assay. The most potent compound, designated as PA-49, is a medium-sized quinolinone derivative bearing a tetrazole moiety, and it inhibited the replication of influenza virus A/WSN/33 at a half maximal inhibitory concentration of 0.47 μM. PA-49 has the ability to bind PA and its anti-influenza activity was promising against various influenza strains, including a clinical isolate of A(H1N1)pdm09 and type B viruses. The docking simulation suggested that PA-49 interrupts the PA-PB1 interface where important amino acids are mostly conserved in the virus strains tested, suggesting the strain independent utility. Because our NUDE/DEGIMA system is rapid and efficient, it may help effective drug discovery against the influenza virus and other emerging viruses.

  14. Application of Combination High-Throughput Phenotypic Screening and Target Identification Methods for the Discovery of Natural Product-Based Combination Drugs.

    Science.gov (United States)

    Isgut, Monica; Rao, Mukkavilli; Yang, Chunhua; Subrahmanyam, Vangala; Rida, Padmashree C G; Aneja, Ritu

    2018-03-01

    Modern drug discovery efforts have had mediocre success rates with increasing developmental costs, and this has encouraged pharmaceutical scientists to seek innovative approaches. Recently with the rise of the fields of systems biology and metabolomics, network pharmacology (NP) has begun to emerge as a new paradigm in drug discovery, with a focus on multiple targets and drug combinations for treating disease. Studies on the benefits of drug combinations lay the groundwork for a renewed focus on natural products in drug discovery. Natural products consist of a multitude of constituents that can act on a variety of targets in the body to induce pharmacodynamic responses that may together culminate in an additive or synergistic therapeutic effect. Although natural products cannot be patented, they can be used as starting points in the discovery of potent combination therapeutics. The optimal mix of bioactive ingredients in natural products can be determined via phenotypic screening. The targets and molecular mechanisms of action of these active ingredients can then be determined using chemical proteomics, and by implementing a reverse pharmacokinetics approach. This review article provides evidence supporting the potential benefits of natural product-based combination drugs, and summarizes drug discovery methods that can be applied to this class of drugs. © 2017 Wiley Periodicals, Inc.

  15. The rays of life, centennial of discovery of Radium

    International Nuclear Information System (INIS)

    Constantin, Enrique; Plazas, Maria C.

    1999-01-01

    The authors make a recount from the discovery of the rays X for William Conrad Roentgen, in November of 1.985, until our days of the main discoveries and advances in medicine, having like base the radium and their importance in the treatment of the cancer

  16. Reconstruction of ribosomal RNA genes from metagenomic data.

    Directory of Open Access Journals (Sweden)

    Lu Fan

    Full Text Available Direct sequencing of environmental DNA (metagenomics has a great potential for describing the 16S rRNA gene diversity of microbial communities. However current approaches using this 16S rRNA gene information to describe community diversity suffer from low taxonomic resolution or chimera problems. Here we describe a new strategy that involves stringent assembly and data filtering to reconstruct full-length 16S rRNA genes from metagenomicpyrosequencing data. Simulations showed that reconstructed 16S rRNA genes provided a true picture of the community diversity, had minimal rates of chimera formation and gave taxonomic resolution down to genus level. The strategy was furthermore compared to PCR-based methods to determine the microbial diversity in two marine sponges. This showed that about 30% of the abundant phylotypes reconstructed from metagenomic data failed to be amplified by PCR. Our approach is readily applicable to existing metagenomic datasets and is expected to lead to the discovery of new microbial phylotypes.

  17. Identification of novel type 1 diabetes candidate genes by integrating genome-wide association data, protein-protein interactions, and human pancreatic islet gene expression

    DEFF Research Database (Denmark)

    Bergholdt, Regine; Brorsson, Caroline; Palleja, Albert

    2012-01-01

    Genome-wide association studies (GWAS) have heralded a new era in susceptibility locus discovery in complex diseases. For type 1 diabetes, >40 susceptibility loci have been discovered. However, GWAS do not inevitably lead to identification of the gene or genes in a given locus associated with dis......-cells. Our results provide novel insight to the mechanisms behind type 1 diabetes pathogenesis and, thus, may provide the basis for the design of novel treatment strategies.......Genome-wide association studies (GWAS) have heralded a new era in susceptibility locus discovery in complex diseases. For type 1 diabetes, >40 susceptibility loci have been discovered. However, GWAS do not inevitably lead to identification of the gene or genes in a given locus associated...... with disease, and they do not typically inform the broader context in which the disease genes operate. Here, we integrated type 1 diabetes GWAS data with protein-protein interactions to construct biological networks of relevance for disease. A total of 17 networks were identified. To prioritize...

  18. Glycosyltransferase Gene Expression Profiles Classify Cancer Types and Propose Prognostic Subtypes

    Science.gov (United States)

    Ashkani, Jahanshah; Naidoo, Kevin J.

    2016-05-01

    Aberrant glycosylation in tumours stem from altered glycosyltransferase (GT) gene expression but can the expression profiles of these signature genes be used to classify cancer types and lead to cancer subtype discovery? The differential structural changes to cellular glycan structures are predominantly regulated by the expression patterns of GT genes and are a hallmark of neoplastic cell metamorphoses. We found that the expression of 210 GT genes taken from 1893 cancer patient samples in The Cancer Genome Atlas (TCGA) microarray data are able to classify six cancers; breast, ovarian, glioblastoma, kidney, colon and lung. The GT gene expression profiles are used to develop cancer classifiers and propose subtypes. The subclassification of breast cancer solid tumour samples illustrates the discovery of subgroups from GT genes that match well against basal-like and HER2-enriched subtypes and correlates to clinical, mutation and survival data. This cancer type glycosyltransferase gene signature finding provides foundational evidence for the centrality of glycosylation in cancer.

  19. The discovery of the periodic table as a case of simultaneous discovery.

    Science.gov (United States)

    Scerri, Eric

    2015-03-13

    The article examines the question of priority and simultaneous discovery in the context of the discovery of the periodic system. It is argued that rather than being anomalous, simultaneous discovery is the rule. Moreover, I argue that the discovery of the periodic system by at least six authors in over a period of 7 years represents one of the best examples of a multiple discovery. This notion is supported by a new view of the evolutionary development of science through a mechanism that is dubbed Sci-Gaia by analogy with Lovelock's Gaia hypothesis. © 2015 The Author(s) Published by the Royal Society. All rights reserved.

  20. Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based SNP discovery protocol.

    Directory of Open Access Journals (Sweden)

    Fei Lu

    Full Text Available Switchgrass (Panicum virgatum L. is a perennial grass that has been designated as an herbaceous model biofuel crop for the United States of America. To facilitate accelerated breeding programs of switchgrass, we developed both an association panel and linkage populations for genome-wide association study (GWAS and genomic selection (GS. All of the 840 individuals were then genotyped using genotyping by sequencing (GBS, generating 350 GB of sequence in total. As a highly heterozygous polyploid (tetraploid and octoploid species lacking a reference genome, switchgrass is highly intractable with earlier methodologies of single nucleotide polymorphism (SNP discovery. To access the genetic diversity of species like switchgrass, we developed a SNP discovery pipeline based on a network approach called the Universal Network-Enabled Analysis Kit (UNEAK. Complexities that hinder single nucleotide polymorphism discovery, such as repeats, paralogs, and sequencing errors, are easily resolved with UNEAK. Here, 1.2 million putative SNPs were discovered in a diverse collection of primarily upland, northern-adapted switchgrass populations. Further analysis of this data set revealed the fundamentally diploid nature of tetraploid switchgrass. Taking advantage of the high conservation of genome structure between switchgrass and foxtail millet (Setaria italica (L. P. Beauv., two parent-specific, synteny-based, ultra high-density linkage maps containing a total of 88,217 SNPs were constructed. Also, our results showed clear patterns of isolation-by-distance and isolation-by-ploidy in natural populations of switchgrass. Phylogenetic analysis supported a general south-to-north migration path of switchgrass. In addition, this analysis suggested that upland tetraploid arose from upland octoploid. All together, this study provides unparalleled insights into the diversity, genomic complexity, population structure, phylogeny, phylogeography, ploidy, and evolutionary dynamics

  1. Beyond Discovery

    DEFF Research Database (Denmark)

    Korsgaard, Steffen; Sassmannshausen, Sean Patrick

    2017-01-01

    In this chapter we explore four alternatives to the dominant discovery view of entrepreneurship; the development view, the construction view, the evolutionary view, and the Neo-Austrian view. We outline the main critique points of the discovery presented in these four alternatives, as well...

  2. Librarian-Initiated Publications Discovery: How Do Digital Depository Librarians Discover and Select Web-Based Government Publications for State Digital Depositories?

    Science.gov (United States)

    Lin, Chi-Shiou; Eschenfelder, Kristin R.

    2010-01-01

    This paper reports on a study of librarian initiated publications discovery (LIPD) in U.S. state digital depository programs using the OCLC Digital Archive to preserve web-based government publications for permanent public access. This paper describes a model of LIPD processes based on empirical investigations of four OCLC DA-based digital…

  3. Key drivers of biomedical innovation in cancer drug discovery

    OpenAIRE

    Huber, Margit A; Kraut, Norbert

    2014-01-01

    Discovery and translational research has led to the identification of a series of ?cancer drivers??genes that, when mutated or otherwise misregulated, can drive malignancy. An increasing number of drugs that directly target such drivers have demonstrated activity in clinical trials and are shaping a new landscape for molecularly targeted cancer therapies. Such therapies rely on molecular and genetic diagnostic tests to detect the presence of a biomarker that predicts response. Here, we highli...

  4. Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model.

    Directory of Open Access Journals (Sweden)

    Gerhard Moser

    2015-04-01

    Full Text Available Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96% had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.

  5. Discovery of genomic intervals that underlie nematode responses to benzimidazoles.

    Science.gov (United States)

    Zamanian, Mostafa; Cook, Daniel E; Zdraljevic, Stefan; Brady, Shannon C; Lee, Daehan; Lee, Junho; Andersen, Erik C

    2018-03-01

    Parasitic nematodes impose a debilitating health and economic burden across much of the world. Nematode resistance to anthelmintic drugs threatens parasite control efforts in both human and veterinary medicine. Despite this threat, the genetic landscape of potential resistance mechanisms to these critical drugs remains largely unexplored. Here, we exploit natural variation in the model nematodes Caenorhabditis elegans and Caenorhabditis briggsae to discover quantitative trait loci (QTL) that control sensitivity to benzimidazoles widely used in human and animal medicine. High-throughput phenotyping of albendazole, fenbendazole, mebendazole, and thiabendazole responses in panels of recombinant lines led to the discovery of over 15 QTL in C. elegans and four QTL in C. briggsae associated with divergent responses to these anthelmintics. Many of these QTL are conserved across benzimidazole derivatives, but others show drug and dose specificity. We used near-isogenic lines to recapitulate and narrow the C. elegans albendazole QTL of largest effect and identified candidate variants correlated with the resistance phenotype. These QTL do not overlap with known benzimidazole target resistance genes from parasitic nematodes and present specific new leads for the discovery of novel mechanisms of nematode benzimidazole resistance. Analyses of orthologous genes reveal conservation of candidate benzimidazole resistance genes in medically important parasitic nematodes. These data provide a basis for extending these approaches to other anthelmintic drug classes and a pathway towards validating new markers for anthelmintic resistance that can be deployed to improve parasite disease control.

  6. "Eureka, Eureka!" Discoveries in Science

    Science.gov (United States)

    Agarwal, Pankaj

    2011-01-01

    Accidental discoveries have been of significant value in the progress of science. Although accidental discoveries are more common in pharmacology and chemistry, other branches of science have also benefited from such discoveries. While most discoveries are the result of persistent research, famous accidental discoveries provide a fascinating…

  7. Serious limitations of the QTL/Microarray approach for QTL gene discovery

    Directory of Open Access Journals (Sweden)

    Warden Craig H

    2010-07-01

    Full Text Available Abstract Background It has been proposed that the use of gene expression microarrays in nonrecombinant parental or congenic strains can accelerate the process of isolating individual genes underlying quantitative trait loci (QTL. However, the effectiveness of this approach has not been assessed. Results Thirty-seven studies that have implemented the QTL/microarray approach in rodents were reviewed. About 30% of studies showed enrichment for QTL candidates, mostly in comparisons between congenic and background strains. Three studies led to the identification of an underlying QTL gene. To complement the literature results, a microarray experiment was performed using three mouse congenic strains isolating the effects of at least 25 biometric QTL. Results show that genes in the congenic donor regions were preferentially selected. However, within donor regions, the distribution of differentially expressed genes was homogeneous once gene density was accounted for. Genes within identical-by-descent (IBD regions were less likely to be differentially expressed in chromosome 2, but not in chromosomes 11 and 17. Furthermore, expression of QTL regulated in cis (cis eQTL showed higher expression in the background genotype, which was partially explained by the presence of single nucleotide polymorphisms (SNP. Conclusions The literature shows limited successes from the QTL/microarray approach to identify QTL genes. Our own results from microarray profiling of three congenic strains revealed a strong tendency to select cis-eQTL over trans-eQTL. IBD regions had little effect on rate of differential expression, and we provide several reasons why IBD should not be used to discard eQTL candidates. In addition, mismatch probes produced false cis-eQTL that could not be completely removed with the current strains genotypes and low probe density microarrays. The reviewed studies did not account for lack of coverage from the platforms used and therefore removed genes

  8. 30 CFR 44.24 - Discovery.

    Science.gov (United States)

    2010-07-01

    ... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Discovery. 44.24 Section 44.24 Mineral... Discovery. Parties shall be governed in their conduct of discovery by appropriate provisions of the Federal... discovery. Alternative periods of time for discovery may be prescribed by the presiding administrative law...

  9. 19 CFR 356.20 - Discovery.

    Science.gov (United States)

    2010-04-01

    ... 19 Customs Duties 3 2010-04-01 2010-04-01 false Discovery. 356.20 Section 356.20 Customs Duties... § 356.20 Discovery. (a) Voluntary discovery. All parties are encouraged to engage in voluntary discovery... sanctions proceeding. (b) Limitations on discovery. The administrative law judge shall place such limits...

  10. Chemical Discovery

    Science.gov (United States)

    Brown, Herbert C.

    1974-01-01

    The role of discovery in the advance of the science of chemistry and the factors that are currently operating to handicap that function are considered. Examples are drawn from the author's work with boranes. The thesis that exploratory research and discovery should be encouraged is stressed. (DT)

  11. 24 CFR 180.500 - Discovery.

    Science.gov (United States)

    2010-04-01

    ... 24 Housing and Urban Development 1 2010-04-01 2010-04-01 false Discovery. 180.500 Section 180.500... OPPORTUNITY CONSOLIDATED HUD HEARING PROCEDURES FOR CIVIL RIGHTS MATTERS Discovery § 180.500 Discovery. (a) In general. This subpart governs discovery in aid of administrative proceedings under this part. Discovery in...

  12. 22 CFR 224.21 - Discovery.

    Science.gov (United States)

    2010-04-01

    ... 22 Foreign Relations 1 2010-04-01 2010-04-01 false Discovery. 224.21 Section 224.21 Foreign....21 Discovery. (a) The following types of discovery are authorized: (1) Requests for production of... parties, discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery...

  13. Some statistical properties of gene expression clustering for array data

    DEFF Research Database (Denmark)

    Abreu, G C G; Pinheiro, A; Drummond, R D

    2010-01-01

    DNA array data without a corresponding statistical error measure. We propose an easy-to-implement and simple-to-use technique that uses bootstrap re-sampling to evaluate the statistical error of the nodes provided by SOM-based clustering. Comparisons between SOM and parametric clustering are presented...... for simulated as well as for two real data sets. We also implement a bootstrap-based pre-processing procedure for SOM, that improves the false discovery ratio of differentially expressed genes. Code in Matlab is freely available, as well as some supplementary material, at the following address: https...

  14. Network Diffusion-Based Prioritization of Autism Risk Genes Identifies Significantly Connected Gene Modules

    Directory of Open Access Journals (Sweden)

    Ettore Mosca

    2017-09-01

    Full Text Available Autism spectrum disorder (ASD is marked by a strong genetic heterogeneity, which is underlined by the low overlap between ASD risk gene lists proposed in different studies. In this context, molecular networks can be used to analyze the results of several genome-wide studies in order to underline those network regions harboring genetic variations associated with ASD, the so-called “disease modules.” In this work, we used a recent network diffusion-based approach to jointly analyze multiple ASD risk gene lists. We defined genome-scale prioritizations of human genes in relation to ASD genes from multiple studies, found significantly connected gene modules associated with ASD and predicted genes functionally related to ASD risk genes. Most of them play a role in synapsis and neuronal development and function; many are related to syndromes that can be in comorbidity with ASD and the remaining are involved in epigenetics, cell cycle, cell adhesion and cancer.

  15. PERSONAL AND CIRCUMSTANTIAL FACTORS INFLUENCING THE ACT OF DISCOVERY.

    Science.gov (United States)

    OSTRANDER, EDWARD R.

    HOW STUDENTS SAY THEY LEARN WAS INVESTIGATED. INTERVIEWS WITH A RANDOM SAMPLE OF 74 WOMEN STUDENTS POSED QUESTIONS ABOUT THE NATURE, FREQUENCY, PATTERNS, AND CIRCUMSTANCES UNDER WHICH ACTS OF DISCOVERY TAKE PLACE IN THE ACADEMIC SETTING. STUDENTS WERE ASSIGNED DISCOVERY RATINGS BASED ON READINGS OF TYPESCRIPTS. EACH STUDENT WAS CLASSIFIED AND…

  16. 19 CFR 207.109 - Discovery.

    Science.gov (United States)

    2010-04-01

    ... 19 Customs Duties 3 2010-04-01 2010-04-01 false Discovery. 207.109 Section 207.109 Customs Duties... and Committee Proceedings § 207.109 Discovery. (a) Discovery methods. All parties may obtain discovery under such terms and limitations as the administrative law judge may order. Discovery may be by one or...

  17. 15 CFR 25.21 - Discovery.

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Discovery. 25.21 Section 25.21... Discovery. (a) The following types of discovery are authorized: (1) Requests for production of documents for..., discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery. (d...

  18. Novel subtractive transcription-based amplification of mRNA (STAR method and its application in search of rare and differentially expressed genes in AD brains

    Directory of Open Access Journals (Sweden)

    Walker P Roy

    2006-11-01

    Full Text Available Abstract Background Alzheimer's disease (AD is a complex disorder that involves multiple biological processes. Many genes implicated in these processes may be present in low abundance in the human brain. DNA microarray analysis identifies changed genes that are expressed at high or moderate levels. Complementary to this approach, we described here a novel technology designed specifically to isolate rare and novel genes previously undetectable by other methods. We have used this method to identify differentially expressed genes in brains affected by AD. Our method, termed Subtractive Transcription-based Amplification of mRNA (STAR, is a combination of subtractive RNA/DNA hybridization and RNA amplification, which allows the removal of non-differentially expressed transcripts and the linear amplification of the differentially expressed genes. Results Using the STAR technology we have identified over 800 differentially expressed sequences in AD brains, both up- and down- regulated, compared to age-matched controls. Over 55% of the sequences represent genes of unknown function and roughly half of them were novel and rare discoveries in the human brain. The expression changes of nearly 80 unique genes were further confirmed by qRT-PCR and the association of additional genes with AD and/or neurodegeneration was established using an in-house literature mining tool (LitMiner. Conclusion The STAR process significantly amplifies unique and rare sequences relative to abundant housekeeping genes and, as a consequence, identifies genes not previously linked to AD. This method also offers new opportunities to study the subtle changes in gene expression that potentially contribute to the development and/or progression of AD.

  19. A Novel Mobile Video Community Discovery Scheme Using Ontology-Based Semantical Interest Capture

    Directory of Open Access Journals (Sweden)

    Ruiling Zhang

    2016-01-01

    Full Text Available Leveraging network virtualization technologies, the community-based video systems rely on the measurement of common interests to define and steady relationship between community members, which promotes video sharing performance and improves scalability community structure. In this paper, we propose a novel mobile Video Community discovery scheme using ontology-based semantical interest capture (VCOSI. An ontology-based semantical extension approach is proposed, which describes video content and measures video similarity according to video key word selection methods. In order to reduce the calculation load of video similarity, VCOSI designs a prefix-filtering-based estimation algorithm to decrease energy consumption of mobile nodes. VCOSI further proposes a member relationship estimate method to construct scalable and resilient node communities, which promotes video sharing capacity of video systems with the flexible and economic community maintenance. Extensive tests show how VCOSI obtains better performance results in comparison with other state-of-the-art solutions.

  20. Genome-wide Gene Expression Profiling of SCID Mice with T-cell-mediated Colitis

    DEFF Research Database (Denmark)

    Brudzewsky, D.; Pedersen, A. E.; Claesson, M. H.

    2009-01-01

    Inflammatory bowel disease (IBD) is a multifactorial disorder with an unknown aetiology. The aim of this study is to employ a murine model of IBD to identify pathways and genes, which may play a key role in the pathogenesis of IBD and could be important for discovery of new disease markers in human...... and colitis mice, and among these genes there is an overrepresentation of genes involved in inflammatory processes. Some of the most significant genes showing higher expression encode S100A proteins and chemokines involved in trafficking of leucocytes in inflammatory areas. Classification by gene clustering...... based on the genes with the significantly altered gene expression corresponds to two different levels of inflammation as established by the histological scoring of the inflamed rectum. These data demonstrate that this SCID T-cell transfer model is a useful animal model for human IBD and can be used...

  1. Scientific Prediction and Prophetic Patenting in Drug Discovery.

    Science.gov (United States)

    Curry, Stephen H; Schneiderman, Anne M

    2015-01-01

    Pharmaceutical patenting involves writing claims based on both discoveries already made, and on prophesy of future developments in an ongoing project. This is necessitated by the very different timelines involved in the drug discovery and product development process on the one hand, and successful patenting on the other. If patents are sought too early there is a risk that patent examiners will disallow claims because of lack of enablement. If patenting is delayed, claims are at risk of being denied on the basis of existence of prior art, because the body of relevant known science will have developed significantly while the project was being pursued. This review examines the role of prophetic patenting in relation to the essential predictability of many aspects of drug discovery science, promoting the concepts of discipline-related and project-related prediction. This is especially directed towards patenting activities supporting commercialization of academia-based discoveries, where long project timelines occur, and where experience, and resources to pay for patenting, are limited. The need for improved collaborative understanding among project scientists, technology transfer professionals in, for example, universities, patent attorneys, and patent examiners is emphasized.

  2. Fast gene ontology based clustering for microarray experiments.

    Science.gov (United States)

    Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa

    2008-11-21

    Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

  3. [Discovery of the target genes inhibited by formic acid in Candida shehatae].

    Science.gov (United States)

    Cai, Peng; Xiong, Xujie; Xu, Yong; Yong, Qiang; Zhu, Junjun; Shiyuan, Yu

    2014-01-04

    At transcriptional level, the inhibitory effects of formic acid was investigated on Candida shehatae, a model yeast strain capable of fermenting xylose to ethanol. Thereby, the target genes were regulated by formic acid and the transcript profiles were discovered. On the basis of the transcriptome data of C. shehatae metabolizing glucose and xylose, the genes responsible for ethanol fermentation were chosen as candidates by the combined method of yeast metabolic pathway analysis and manual gene BLAST search. These candidates were then quantitatively detected by RQ-PCR technique to find the regulating genes under gradient doses of formic acid. By quantitative analysis of 42 candidate genes, we finally identified 10 and 5 genes as markedly down-regulated and up-regulated targets by formic acid, respectively. With regard to gene transcripts regulated by formic acid in C. shehatae, the markedly down-regulated genes ranking declines as follows: xylitol dehydrogenase (XYL2), acetyl-CoA synthetase (ACS), ribose-5-phosphate isomerase (RKI), transaldolase (TAL), phosphogluconate dehydrogenase (GND1), transketolase (TKL), glucose-6-phosphate dehydrogenase (ZWF1), xylose reductase (XYL1), pyruvate dehydrogenase (PDH) and pyruvate decarboxylase (PDC); and a declining rank for up-regulated gens as follows: fructose-bisphosphate aldolase (ALD), glucokinase (GLK), malate dehydrogenase (MDH), 6-phosphofructokinase (PFK) and alcohol dehydrogenase (ADH).

  4. Transcriptomics Analysis of Crassostrea hongkongensis for the Discovery of Reproduction-Related Genes

    Science.gov (United States)

    Tong, Ying; Zhang, Yang; Huang, Jiaomei; Xiao, Shu; Zhang, Yuehuan; Li, Jun; Chen, Jinhui; Yu, Ziniu

    2015-01-01

    Background The reproductive mechanisms of mollusk species have been interesting targets in biological research because of the diverse reproductive strategies observed in this phylum. These species have also been studied for the development of fishery technologies in molluscan aquaculture. Although the molecular mechanisms underlying the reproductive process have been well studied in animal models, the relevant information from mollusks remains limited, particularly in species of great commercial interest. Crassostrea hongkongensis is the dominant oyster species that is distributed along the coast of the South China Sea and little genomic information on this species is available. Currently, high-throughput sequencing techniques have been widely used for investigating the basis of physiological processes and facilitating the establishment of adequate genetic selection programs. Results The C.hongkongensis transcriptome included a total of 1,595,855 reads, which were generated by 454 sequencing and were assembled into 41,472 contigs using de novo methods. Contigs were clustered into 33,920 isotigs and further grouped into 22,829 isogroups. Approximately 77.6% of the isogroups were successfully annotated by the Nr database. More than 1,910 genes were identified as being related to reproduction. Some key genes involved in germline development, sex determination and differentiation were identified for the first time in C.hongkongensis (nanos, piwi, ATRX, FoxL2, β-catenin, etc.). Gene expression analysis indicated that vasa, nanos, piwi, ATRX, FoxL2, β-catenin and SRD5A1 were highly or specifically expressed in C.hongkongensis gonads. Additionally, 94,056 single nucleotide polymorphisms (SNPs) and 1,699 simple sequence repeats (SSRs) were compiled. Conclusions Our study significantly increased C.hongkongensis genomic information based on transcriptomics analysis. The group of reproduction-related genes identified in the present study constitutes a new tool for research

  5. Transcriptomics Analysis of Crassostrea hongkongensis for the Discovery of Reproduction-Related Genes.

    Directory of Open Access Journals (Sweden)

    Ying Tong

    Full Text Available The reproductive mechanisms of mollusk species have been interesting targets in biological research because of the diverse reproductive strategies observed in this phylum. These species have also been studied for the development of fishery technologies in molluscan aquaculture. Although the molecular mechanisms underlying the reproductive process have been well studied in animal models, the relevant information from mollusks remains limited, particularly in species of great commercial interest. Crassostrea hongkongensis is the dominant oyster species that is distributed along the coast of the South China Sea and little genomic information on this species is available. Currently, high-throughput sequencing techniques have been widely used for investigating the basis of physiological processes and facilitating the establishment of adequate genetic selection programs.The C.hongkongensis transcriptome included a total of 1,595,855 reads, which were generated by 454 sequencing and were assembled into 41,472 contigs using de novo methods. Contigs were clustered into 33,920 isotigs and further grouped into 22,829 isogroups. Approximately 77.6% of the isogroups were successfully annotated by the Nr database. More than 1,910 genes were identified as being related to reproduction. Some key genes involved in germline development, sex determination and differentiation were identified for the first time in C.hongkongensis (nanos, piwi, ATRX, FoxL2, β-catenin, etc.. Gene expression analysis indicated that vasa, nanos, piwi, ATRX, FoxL2, β-catenin and SRD5A1 were highly or specifically expressed in C.hongkongensis gonads. Additionally, 94,056 single nucleotide polymorphisms (SNPs and 1,699 simple sequence repeats (SSRs were compiled.Our study significantly increased C.hongkongensis genomic information based on transcriptomics analysis. The group of reproduction-related genes identified in the present study constitutes a new tool for research on bivalve

  6. Multi-label literature classification based on the Gene Ontology graph

    Directory of Open Access Journals (Sweden)

    Lu Xinghua

    2008-12-01

    Full Text Available Abstract Background The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Conclusion Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate

  7. Facilitating Students' Interaction with Real Gas Properties Using a Discovery-Based Approach and Molecular Dynamics Simulations

    Science.gov (United States)

    Sweet, Chelsea; Akinfenwa, Oyewumi; Foley, Jonathan J., IV

    2018-01-01

    We present an interactive discovery-based approach to studying the properties of real gases using simple, yet realistic, molecular dynamics software. Use of this approach opens up a variety of opportunities for students to interact with the behaviors and underlying theories of real gases. Students can visualize gas behavior under a variety of…

  8. MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development.

    Science.gov (United States)

    Korkmaz, Selcuk; Zararsiz, Gokmen; Goksuluk, Dincer

    2015-01-01

    Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/.

  9. Genetic interaction motif finding by expectation maximization – a novel statistical model for inferring gene modules from synthetic lethality

    Directory of Open Access Journals (Sweden)

    Ye Ping

    2005-12-01

    Full Text Available Abstract Background Synthetic lethality experiments identify pairs of genes with complementary function. More direct functional associations (for example greater probability of membership in a single protein complex may be inferred between genes that share synthetic lethal interaction partners than genes that are directly synthetic lethal. Probabilistic algorithms that identify gene modules based on motif discovery are highly appropriate for the analysis of synthetic lethal genetic interaction data and have great potential in integrative analysis of heterogeneous datasets. Results We have developed Genetic Interaction Motif Finding (GIMF, an algorithm for unsupervised motif discovery from synthetic lethal interaction data. Interaction motifs are characterized by position weight matrices and optimized through expectation maximization. Given a seed gene, GIMF performs a nonlinear transform on the input genetic interaction data and automatically assigns genes to the motif or non-motif category. We demonstrate the capacity to extract known and novel pathways for Saccharomyces cerevisiae (budding yeast. Annotations suggested for several uncharacterized genes are supported by recent experimental evidence. GIMF is efficient in computation, requires no training and automatically down-weights promiscuous genes with high degrees. Conclusion GIMF effectively identifies pathways from synthetic lethality data with several unique features. It is mostly suitable for building gene modules around seed genes. Optimal choice of one single model parameter allows construction of gene networks with different levels of confidence. The impact of hub genes the generic probabilistic framework of GIMF may be used to group other types of biological entities such as proteins based on stochastic motifs. Analysis of the strongest motifs discovered by the algorithm indicates that synthetic lethal interactions are depleted between genes within a motif, suggesting that synthetic

  10. 39 CFR 963.14 - Discovery.

    Science.gov (United States)

    2010-07-01

    ... 39 Postal Service 1 2010-07-01 2010-07-01 false Discovery. 963.14 Section 963.14 Postal Service... PANDERING ADVERTISEMENTS STATUTE, 39 U.S.C. 3008 § 963.14 Discovery. Discovery is to be conducted on a... such discovery as he or she deems reasonable and necessary. Discovery may include one or more of the...

  11. The analysis of student’s critical thinking ability on discovery learning by using hand on activity based on the curiosity

    Science.gov (United States)

    Sulistiani, E.; Waluya, S. B.; Masrukan

    2018-03-01

    This study aims to determine (1) the effectiveness of Discovery Learning model by using Hand on Activity toward critical thinking abilities, and (2) to describe students’ critical thinking abilities in Discovery Learning by Hand on Activity based on curiosity. This study is mixed method research with concurrent embedded design. Sample of this study are students of VII A and VII B of SMP Daarul Qur’an Ungaran. While the subject in this study is based on the curiosity of the students groups are classified Epistemic Curiosity (EC) and Perceptual Curiosity (PC). The results showed that the learning of Discovery Learning by using Hand on Activity is effective toward mathematics critical thinking abilities. Students of the EC type are able to complete six indicators of mathematics critical thinking abilities, although there are still two indicators that the result is less than the maximum. While students of PC type have not fully been able to complete the indicator of mathematics critical thinking abilities. They are only strong on indicators formulating questions, while on the other five indicators they are still weak. The critical thinking abilities of EC’s students is better than the critical thinking abilities of the PC’s students.

  12. A Bioinorganic Approach to Fragment-Based Drug Discovery Targeting Metalloenzymes.

    Science.gov (United States)

    Cohen, Seth M

    2017-08-15

    Metal-dependent enzymes (i.e., metalloenzymes) make up a large fraction of all enzymes and are critically important in a wide range of biological processes, including DNA modification, protein homeostasis, antibiotic resistance, and many others. Consequently, metalloenzymes represent a vast and largely untapped space for drug development. The discovery of effective therapeutics that target metalloenzymes lies squarely at the interface of bioinorganic and medicinal chemistry and requires expertise, methods, and strategies from both fields to mount an effective campaign. In this Account, our research program that brings together the principles and methods of bioinorganic and medicinal chemistry are described, in an effort to bridge the gap between these fields and address an important class of medicinal targets. Fragment-based drug discovery (FBDD) is an important drug discovery approach that is particularly well suited for metalloenzyme inhibitor development. FBDD uses relatively small but diverse chemical structures that allow for the assembly of privileged molecular collections that focus on a specific feature of the target enzyme. For metalloenzyme inhibition, the specific feature is rather obvious, namely, a metal-dependent active site. Surprisingly, prior to our work, the exploration of diverse molecular fragments for binding the metal active sites of metalloenzymes was largely unexplored. By assembling a modest library of metal-binding pharmacophores (MBPs), we have been able to find lead hits for many metalloenzymes and, from these hits, develop inhibitors that act via novel mechanisms of action. A specific case study on the use of this strategy to identify a first-in-class inhibitor of zinc-dependent Rpn11 (a component of the proteasome) is highlighted. The application of FBDD for the development of metalloenzyme inhibitors has raised several other compelling questions, such as how the metalloenzyme active site influences the coordination chemistry of bound

  13. An investigation of the effects of relevant samples and a comparison of verification versus discovery based lab design

    Science.gov (United States)

    Rieben, James C., Jr.

    This study focuses on the effects of relevance and lab design on student learning within the chemistry laboratory environment. A general chemistry conductivity of solutions experiment and an upper level organic chemistry cellulose regeneration experiment were employed. In the conductivity experiment, the two main variables studied were the effect of relevant (or "real world") samples on student learning and a verification-based lab design versus a discovery-based lab design. With the cellulose regeneration experiment, the effect of a discovery-based lab design vs. a verification-based lab design was the sole focus. Evaluation surveys consisting of six questions were used at three different times to assess student knowledge of experimental concepts. In the general chemistry laboratory portion of this study, four experimental variants were employed to investigate the effect of relevance and lab design on student learning. These variants consisted of a traditional (or verification) lab design, a traditional lab design using "real world" samples, a new lab design employing real world samples/situations using unknown samples, and the new lab design using real world samples/situations that were known to the student. Data used in this analysis were collected during the Fall 08, Winter 09, and Fall 09 terms. For the second part of this study a cellulose regeneration experiment was employed to investigate the effects of lab design. A demonstration creating regenerated cellulose "rayon" was modified and converted to an efficient and low-waste experiment. In the first variant students tested their products and verified a list of physical properties. In the second variant, students filled in a blank physical property chart with their own experimental results for the physical properties. Results from the conductivity experiment show significant student learning of the effects of concentration on conductivity and how to use conductivity to differentiate solution types with the

  14. Engineering Application Way of Faults Knowledge Discovery Based on Rough Set Theory

    International Nuclear Information System (INIS)

    Zhao Rongzhen; Deng Linfeng; Li Chao

    2011-01-01

    For the knowledge acquisition puzzle of intelligence decision-making technology in mechanical industry, to use the Rough Set Theory (RST) as a kind of tool to solve the puzzle was researched. And the way to realize the knowledge discovery in engineering application is explored. A case extracting out the knowledge rules from a concise data table shows out some important information. It is that the knowledge discovery similar to the mechanical faults diagnosis is an item of complicated system engineering project. In where, first of all-important tasks is to preserve the faults knowledge into a table with data mode. And the data must be derived from the plant site and should also be as concise as possible. On the basis of the faults knowledge data obtained so, the methods and algorithms to process the data and extract the knowledge rules from them by means of RST can be processed only. The conclusion is that the faults knowledge discovery by the way is a process of rising upward. But to develop the advanced faults diagnosis technology by the way is a large-scale knowledge engineering project for long time. Every step in which should be designed seriously according to the tool's demands firstly. This is the basic guarantees to make the knowledge rules obtained have the values of engineering application and the studies have scientific significance. So, a general framework is designed for engineering application to go along the route developing the faults knowledge discovery technology.

  15. The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update.

    Science.gov (United States)

    Huynh, Tien; Rigoutsos, Isidore

    2004-07-01

    In this report, we provide an update on the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server, which is operational around the clock, provides access to a large number of methods that have been developed and published by the group's members. There is an increasing number of problems that these tools can help tackle; these problems range from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences, the identification--directly from sequence--of structural deviations from alpha-helicity and the annotation of amino acid sequences for antimicrobial activity. Additionally, annotations for more than 130 archaeal, bacterial, eukaryotic and viral genomes are now available on-line and can be searched interactively. The tools and code bundles continue to be accessible from http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.

  16. Genetic and epigenetic control of gene expression by CRISPR–Cas systems

    Science.gov (United States)

    Lo, Albert; Qi, Lei

    2017-01-01

    The discovery and adaption of bacterial clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR-associated (Cas) systems has revolutionized the way researchers edit genomes. Engineering of catalytically inactivated Cas variants (nuclease-deficient or nuclease-deactivated [dCas]) combined with transcriptional repressors, activators, or epigenetic modifiers enable sequence-specific regulation of gene expression and chromatin state. These CRISPR–Cas-based technologies have contributed to the rapid development of disease models and functional genomics screening approaches, which can facilitate genetic target identification and drug discovery. In this short review, we will cover recent advances of CRISPR–dCas9 systems and their use for transcriptional repression and activation, epigenome editing, and engineered synthetic circuits for complex control of the mammalian genome. PMID:28649363

  17. Zebrafish whole-adult-organism chemogenomics for large-scale predictive and discovery chemical biology.

    Directory of Open Access Journals (Sweden)

    Siew Hong Lam

    2008-07-01

    Full Text Available The ability to perform large-scale, expression-based chemogenomics on whole adult organisms, as in invertebrate models (worm and fly, is highly desirable for a vertebrate model but its feasibility and potential has not been demonstrated. We performed expression-based chemogenomics on the whole adult organism of a vertebrate model, the zebrafish, and demonstrated its potential for large-scale predictive and discovery chemical biology. Focusing on two classes of compounds with wide implications to human health, polycyclic (halogenated aromatic hydrocarbons [P(HAHs] and estrogenic compounds (ECs, we generated robust prediction models that can discriminate compounds of the same class from those of different classes in two large independent experiments. The robust expression signatures led to the identification of biomarkers for potent aryl hydrocarbon receptor (AHR and estrogen receptor (ER agonists, respectively, and were validated in multiple targeted tissues. Knowledge-based data mining of human homologs of zebrafish genes revealed highly conserved chemical-induced biological responses/effects, health risks, and novel biological insights associated with AHR and ER that could be inferred to humans. Thus, our study presents an effective, high-throughput strategy of capturing molecular snapshots of chemical-induced biological states of a whole adult vertebrate that provides information on biomarkers of effects, deregulated signaling pathways, and possible affected biological functions, perturbed physiological systems, and increased health risks. These findings place zebrafish in a strategic position to bridge the wide gap between cell-based and rodent models in chemogenomics research and applications, especially in preclinical drug discovery and toxicology.

  18. Symbiosis-inspired approaches to antibiotic discovery.

    Science.gov (United States)

    Adnani, Navid; Rajski, Scott R; Bugni, Tim S

    2017-07-06

    Covering: 2010 up to 2017Life on Earth is characterized by a remarkable abundance of symbiotic and highly refined relationships among life forms. Defined as any kind of close, long-term association between two organisms, symbioses can be mutualistic, commensalistic or parasitic. Historically speaking, selective pressures have shaped symbioses in which one organism (typically a bacterium or fungus) generates bioactive small molecules that impact the host (and possibly other symbionts); the symbiosis is driven fundamentally by the genetic machineries available to the small molecule producer. The human microbiome is now integral to the most recent chapter in animal-microbe symbiosis studies and plant-microbe symbioses have significantly advanced our understanding of natural products biosynthesis; this also is the case for studies of fungal-microbe symbioses. However, much less is known about microbe-microbe systems involving interspecies interactions. Microbe-derived small molecules (i.e. antibiotics and quorum sensing molecules, etc.) have been shown to regulate transcription in microbes within the same environmental niche, suggesting interspecies interactions whereas, intraspecies interactions, such as those that exploit autoinducing small molecules, also modulate gene expression based on environmental cues. We, and others, contend that symbioses provide almost unlimited opportunities for the discovery of new bioactive compounds whose activities and applications have been evolutionarily optimized. Particularly intriguing is the possibility that environmental effectors can guide laboratory expression of secondary metabolites from "orphan", or silent, biosynthetic gene clusters (BGCs). Notably, many of the studies summarized here result from advances in "omics" technologies and highlight how symbioses have given rise to new anti-bacterial and antifungal natural products now being discovered.

  19. A wavelet-based approach to the discovery of themes and sections in monophonic melodies

    DEFF Research Database (Denmark)

    Velarde, Gissel; Meredith, David

    We present the computational method submitted to the MIREX 2014 Discovery of Repeated Themes & Sections task, and the results on the monophonic version of the JKU Patterns Development Database. In the context of pattern discovery in monophonic music, the idea behind our method is that, with a good...

  20. Discovery and optimization of peptide-based anti-cobratoxins

    DEFF Research Database (Denmark)

    Sola, M.; Laustsen, Andreas Hougaard; Johannesen, J.

    More than 5.5 million people per year are victims of snake envenomation, resulting in 125,000 deaths and 400,000amputations worldwide. Antivenoms are still produced by animal immunization procedures, and they areassociated with a high risk of severe adverse reactions. Alternatively, synthetic pep...... peptides may open the possibility for newtherapies with better efficacy and safety. Here, we report the discovery and optimization of a synthetic peptide directedagainst α-cobratoxin (α-CTX), the most toxic component of Monocled cobra (Naja kaouthia)....

  1. Energy-Efficient Cluster-Based Service Discovery in Wireless Sensor Networks

    NARCIS (Netherlands)

    Marin Perianu, Raluca; Scholten, Johan; Havinga, Paul J.M.; Hartel, Pieter H.

    We propose an energy-efficient service discovery protocol for wireless sensor networks. Our solution exploits a cluster overlay, where the clusterhead nodes form a distributed service registry. A service lookup results in visiting only the clusterhead nodes. We aim for minimizing the communication

  2. Energy-Efficient Cluster-Based Service Discovery in Wireless Sensor Networks

    NARCIS (Netherlands)

    Marin Perianu, Raluca; Scholten, Johan; Havinga, Paul J.M.; Hartel, Pieter H.

    2006-01-01

    We propose an energy-efficient service discovery protocol for wireless sensor networks. Our solution exploits a cluster overlay, where the clusterhead nodes form a distributed service registry. A service lookup results in visiting only the clusterhead nodes. We aim for minimizing the communication

  3. Simulated JWST/NIRISS Transit Spectroscopy of Anticipated Tess Planets Compared to Select Discoveries from Space-based and Ground-based Surveys

    Science.gov (United States)

    Louie, Dana R.; Deming, Drake; Albert, Loic; Bouma, L. G.; Bean, Jacob; Lopez-Morales, Mercedes

    2018-04-01

    The Transiting Exoplanet Survey Satellite (TESS) will embark in 2018 on a 2 year wide-field survey mission, discovering over a thousand terrestrial, super-Earth and sub-Neptune-sized exoplanets ({R}pl}≤slant 4 {R}\\oplus ) potentially suitable for follow-up observations using the James Webb Space Telescope (JWST). This work aims to understand the suitability of anticipated TESS planet discoveries for atmospheric characterization by JWST’s Near InfraRed Imager and Slitless Spectrograph (NIRISS) by employing a simulation tool to estimate the signal-to-noise (S/N) achievable in transmission spectroscopy. We applied this tool to Monte Carlo predictions of the TESS expected planet yield and then compared the S/N for anticipated TESS discoveries to our estimates of S/N for 18 known exoplanets. We analyzed the sensitivity of our results to planetary composition, cloud cover, and presence of an observational noise floor. We find that several hundred anticipated TESS discoveries with radii 1.5 {R}\\oplus R}pl}≤slant 2.5 {R}\\oplus will produce S/N higher than currently known exoplanets in this radius regime, such as K2-3b or K2-3c. In the terrestrial planet regime, we find that only a few anticipated TESS discoveries will result in higher S/N than currently known exoplanets, such as the TRAPPIST-1 planets, GJ1132b, and LHS1140b. However, we emphasize that this outcome is based upon Kepler-derived occurrence rates, and that co-planar compact multi-planet systems (e.g., TRAPPIST-1) may be under-represented in the predicted TESS planet yield. Finally, we apply our calculations to estimate the required magnitude of a JWST follow-up program devoted to mapping the transition region between hydrogen-dominated and high molecular weight atmospheres. We find that a modest observing program of between 60 and 100 hr of charged JWST time can define the nature of that transition (e.g., step function versus a power law).

  4. Molecular identification of birds: performance of distance-based DNA barcoding in three genes to delimit parapatric species.

    Directory of Open Access Journals (Sweden)

    Mansour Aliabadian

    Full Text Available DNA barcoding based on the mitochondrial cytochrome oxidase subunit I gene (cox1 or COI has been successful in species identification across a wide array of taxa but in some cases failed to delimit the species boundaries of closely allied allopatric species or of hybridising sister species.In this study we extend the sample size of prior studies in birds for cox1 (2776 sequences, 756 species and target especially species that are known to occur parapatrically, and/or are known to hybridise, on a Holarctic scale. In order to obtain a larger set of taxa (altogether 2719 species, we include also DNA sequences of two other mitochondrial genes: cytochrome b (cob (4614 sequences, 2087 species and 16S (708 sequences, 498 species. Our results confirm the existence of a wide gap between intra- and interspecies divergences for both cox1 and cob, and indicate that distance-based DNA barcoding provides sufficient information to identify and delineate bird species in 98% of all possible pairwise comparisons. This DNA barcoding gap was not statistically influenced by the number of individuals sequenced per species. However, most of the hybridising parapatric species pairs have average divergences intermediate between intraspecific and interspecific distances for both cox1 and cob.DNA barcoding, if used as a tool for species discovery, would thus fail to identify hybridising parapatric species pairs. However, most of them can probably still assigned to known species by character-based approaches, although development of complementary nuclear markers will be necessary to account for mitochondrial introgression in hybridising species.

  5. A high-throughput fluorescence-based assay system for appetite-regulating gene and drug screening.

    Directory of Open Access Journals (Sweden)

    Yasuhito Shimada

    Full Text Available The increasing number of people suffering from metabolic syndrome and obesity is becoming a serious problem not only in developed countries, but also in developing countries. However, there are few agents currently approved for the treatment of obesity. Those that are available are mainly appetite suppressants and gastrointestinal fat blockers. We have developed a simple and rapid method for the measurement of the feeding volume of Danio rerio (zebrafish. This assay can be used to screen appetite suppressants and enhancers. In this study, zebrafish were fed viable paramecia that were fluorescently-labeled, and feeding volume was measured using a 96-well microplate reader. Gene expression analysis of brain-derived neurotrophic factor (bdnf, knockdown of appetite-regulating genes (neuropeptide Y, preproinsulin, melanocortin 4 receptor, agouti related protein, and cannabinoid receptor 1, and the administration of clinical appetite suppressants (fluoxetine, sibutramine, mazindol, phentermine, and rimonabant revealed the similarity among mechanisms regulating appetite in zebrafish and mammals. In combination with behavioral analysis, we were able to evaluate adverse effects on locomotor activities from gene knockdown and chemical treatments. In conclusion, we have developed an assay that uses zebrafish, which can be applied to high-throughput screening and target gene discovery for appetite suppressants and enhancers.

  6. Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

    Science.gov (United States)

    Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N

    2013-03-15

    The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter

  7. Usability of Discovery Portals

    OpenAIRE

    Bulens, J.D.; Vullings, L.A.E.; Houtkamp, J.M.; Vanmeulebrouk, B.

    2013-01-01

    As INSPIRE progresses to be implemented in the EU, many new discovery portals are built to facilitate finding spatial data. Currently the structure of the discovery portals is determined by the way spatial data experts like to work. However, we argue that the main target group for discovery portals are not spatial data experts but professionals with limited spatial knowledge, and a focus outside the spatial domain. An exploratory usability experiment was carried out in which three discovery p...

  8. Gene discovery for the bark beetle-vectored fungal tree pathogen Grosmannia clavigera

    Directory of Open Access Journals (Sweden)

    Robertson Gordon

    2010-10-01

    Full Text Available Abstract Background Grosmannia clavigera is a bark beetle-vectored fungal pathogen of pines that causes wood discoloration and may kill trees by disrupting nutrient and water transport. Trees respond to attacks from beetles and associated fungi by releasing terpenoid and phenolic defense compounds. It is unclear which genes are important for G. clavigera's ability to overcome antifungal pine terpenoids and phenolics. Results We constructed seven cDNA libraries from eight G. clavigera isolates grown under various culture conditions, and Sanger sequenced the 5' and 3' ends of 25,000 cDNA clones, resulting in 44,288 high quality ESTs. The assembled dataset of unique transcripts (unigenes consists of 6,265 contigs and 2,459 singletons that mapped to 6,467 locations on the G. clavigera reference genome, representing ~70% of the predicted G. clavigera genes. Although only 54% of the unigenes matched characterized proteins at the NCBI database, this dataset extensively covers major metabolic pathways, cellular processes, and genes necessary for response to environmental stimuli and genetic information processing. Furthermore, we identified genes expressed in spores prior to germination, and genes involved in response to treatment with lodgepole pine phloem extract (LPPE. Conclusions We provide a comprehensively annotated EST dataset for G. clavigera that represents a rich resource for gene characterization in this and other ophiostomatoid fungi. Genes expressed in response to LPPE treatment are indicative of fungal oxidative stress response. We identified two clusters of potentially functionally related genes responsive to LPPE treatment. Furthermore, we report a simple method for identifying contig misassemblies in de novo assembled EST collections caused by gene overlap on the genome.

  9. Gene-based testing of interactions in association studies of quantitative traits.

    Directory of Open Access Journals (Sweden)

    Li Ma

    Full Text Available Various methods have been developed for identifying gene-gene interactions in genome-wide association studies (GWAS. However, most methods focus on individual markers as the testing unit, and the large number of such tests drastically erodes statistical power. In this study, we propose novel interaction tests of quantitative traits that are gene-based and that confer advantage in both statistical power and biological interpretation. The framework of gene-based gene-gene interaction (GGG tests combine marker-based interaction tests between all pairs of markers in two genes to produce a gene-level test for interaction between the two. The tests are based on an analytical formula we derive for the correlation between marker-based interaction tests due to linkage disequilibrium. We propose four GGG tests that extend the following P value combining methods: minimum P value, extended Simes procedure, truncated tail strength, and truncated P value product. Extensive simulations point to correct type I error rates of all tests and show that the two truncated tests are more powerful than the other tests in cases of markers involved in the underlying interaction not being directly genotyped and in cases of multiple underlying interactions. We applied our tests to pairs of genes that exhibit a protein-protein interaction to test for gene-level interactions underlying lipid levels using genotype data from the Atherosclerosis Risk in Communities study. We identified five novel interactions that are not evident from marker-based interaction testing and successfully replicated one of these interactions, between SMAD3 and NEDD9, in an independent sample from the Multi-Ethnic Study of Atherosclerosis. We conclude that our GGG tests show improved power to identify gene-level interactions in existing, as well as emerging, association studies.

  10. Knowledge discovery in traditional Chinese medicine: state of the art and perspectives.

    Science.gov (United States)

    Feng, Yi; Wu, Zhaohui; Zhou, Xuezhong; Zhou, Zhongmei; Fan, Weiyu

    2006-11-01

    As a complementary medical system to Western medicine, traditional Chinese medicine (TCM) provides a unique theoretical and practical approach to the treatment of diseases over thousands of years. Confronted with the increasing popularity of TCM and the huge volume of TCM data, historically accumulated and recently obtained, there is an urgent need to explore these resources effectively by the techniques of knowledge discovery in database (KDD). This paper aims at providing an overview of recent KDD studies in TCM field. A literature search was conducted in both English and Chinese publications, and major studies of knowledge discovery in TCM (KDTCM) reported in these materials were identified. Based on an introduction to the state of the art of TCM data resources, a review of four subfields of KDTCM research was presented, including KDD for the research of Chinese medical formula, KDD for the research of Chinese herbal medicine, KDD for TCM syndrome research, and KDD for TCM clinical diagnosis. Furthermore, the current state and main problems in each subfield were summarized based on a discussion of existing studies, and future directions for each subfield were also proposed accordingly. A series of KDD methods are used in existing KDTCM researches, ranging from conventional frequent itemset mining to state of the art latent structure model. Considerable interesting discoveries are obtained by these methods, such as novel TCM paired drugs discovered by frequent itemset analysis, functional community of related genes discovered under syndrome perspective by text mining, the high proportion of toxic plants in the botanical family Ranunculaceae disclosed by statistical analysis, the association between M-cholinoceptor blocking drug and Solanaceae revealed by association rule mining, etc. It is particularly inspiring to see some studies connecting TCM with biomedicine, which provide a novel top-down view for functional genomics research. However, further developments

  11. Fast Gene Ontology based clustering for microarray experiments

    Directory of Open Access Journals (Sweden)

    Ovaska Kristian

    2008-11-01

    Full Text Available Abstract Background Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. Results We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Conclusion Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

  12. Large-scale discovery of promoter motifs in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Thomas A Down

    2007-01-01

    Full Text Available A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.

  13. Pharmacogenetics in type 2 diabetes: precision medicine or discovery tool?

    Science.gov (United States)

    Florez, Jose C

    2017-05-01

    In recent years, technological and analytical advances have led to an explosion in the discovery of genetic loci associated with type 2 diabetes. However, their ability to improve prediction of disease outcomes beyond standard clinical risk factors has been limited. On the other hand, genetic effects on drug response may be stronger than those commonly seen for disease incidence. Pharmacogenetic findings may aid in identifying new drug targets, elucidate pathophysiology, unravel disease heterogeneity, help prioritise specific genes in regions of genetic association, and contribute to personalised or precision treatment. In diabetes, precedent for the successful application of pharmacogenetic concepts exists in its monogenic subtypes, such as MODY or neonatal diabetes. Whether similar insights will emerge for the much more common entity of type 2 diabetes remains to be seen. As genetic approaches advance, the progressive deployment of candidate gene, large-scale genotyping and genome-wide association studies has begun to produce suggestive results that may transform clinical practice. However, many barriers to the translation of diabetes pharmacogenetic discoveries to the clinic still remain. This perspective offers a contemporary overview of the field with a focus on sulfonylureas and metformin, identifies the major uses of pharmacogenetics, and highlights potential limitations and future directions.

  14. Systems pharmacology-based drug discovery for marine resources: an example using sea cucumber (Holothurians).

    Science.gov (United States)

    Guo, Yingying; Ding, Yan; Xu, Feifei; Liu, Baoyue; Kou, Zinong; Xiao, Wei; Zhu, Jingbo

    2015-05-13

    Sea cucumber, a kind of marine animal, have long been utilized as tonic and traditional remedies in the Middle East and Asia because of its effectiveness against hypertension, asthma, rheumatism, cuts and burns, impotence, and constipation. In this study, an overall study performed on sea cucumber was used as an example to show drug discovery from marine resource by using systems pharmacology model. The value of marine natural resources has been extensively considered because these resources can be potentially used to treat and prevent human diseases. However, the discovery of drugs from oceans is difficult, because of complex environments in terms of composition and active mechanisms. Thus, a comprehensive systems approach which could discover active constituents and their targets from marine resource, understand the biological basis for their pharmacological properties is necessary. In this study, a feasible pharmacological model based on systems pharmacology was established to investigate marine medicine by incorporating active compound screening, target identification, and network and pathway analysis. As a result, 106 candidate components of sea cucumber and 26 potential targets were identified. Furthermore, the functions of sea cucumber in health improvement and disease treatment were elucidated in a holistic way based on the established compound-target and target-disease networks, and incorporated pathways. This study established a novel strategy that could be used to explore specific active mechanisms and discover new drugs from marine sources. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  15. A Sensitive Assay for Virus Discovery in Respiratory Clinical Samples

    NARCIS (Netherlands)

    de Vries, Michel; Deijs, Martin; Canuti, Marta; van Schaik, Barbera D. C.; Faria, Nuno R.; van de Garde, Martijn D. B.; Jachimowski, Loes C. M.; Jebbink, Maarten F.; Jakobs, Marja; Luyf, Angela C. M.; Coenjaerts, Frank E. J.; Claas, Eric C. J.; Molenkamp, Richard; Koekkoek, Sylvie M.; Lammens, Christine; Leus, Frank; Goossens, Herman; Ieven, Margareta; Baas, Frank; van der Hoek, Lia

    2011-01-01

    In 5-40% of respiratory infections in children, the diagnostics remain negative, suggesting that the patients might be infected with a yet unknown pathogen. Virus discovery cDNA-AFLP (VIDISCA) is a virus discovery method based on recognition of restriction enzyme cleavage sites, ligation of adaptors

  16. Identification of Novel Signal Transduction, Immune Function, and Oxidative Stress Genes and Pathways by Topiramate for Treatment of Methamphetamine Dependence Based on Secondary Outcomes

    Directory of Open Access Journals (Sweden)

    Tianhua Niu

    2017-12-01

    Full Text Available BackgroundTopiramate (TPM is suggested to be a promising medication for treatment of methamphetamine (METH dependence, but the molecular basis remains to be elucidated.MethodsAmong 140 METH-dependent participants randomly assigned to receive either TPM (N = 69 or placebo (N = 71 in a previously conducted randomized controlled trial, 50 TPM- and 49 placebo-treated participants had a total 212 RNA samples available at baseline, week 8, and week 12 time points. Following our primary analysis of gene expression data, we reanalyzed the microarray expression data based on a latent class analysis of binary secondary outcomes during weeks 1–12 that provided a classification of 21 responders and 31 non-responders with consistent responses at both time points.ResultsBased on secondary outcomes, 1,381, 576, 905, and 711 differentially expressed genes at nominal P values < 0.05 were identified in responders versus non-responders for week 8 TPM, week 8 placebo, week 12 TPM, and week 12 placebo groups, respectively. Among 1,381 genes identified in week 8 TPM responders, 359 genes were identified in both week 8 and week 12 TPM groups, of which 300 genes were exclusively detected in TPM responders. Of them, 32 genes had nominal P values < 5 × 10−3 at either week 8 or week 12 and false discovery rates < 0.15 at both time points with consistent directions of gene expression changes, which include GABARAPL1, GPR155, and IL15RA in GABA receptor signaling that represent direct targets for TPM. Analyses of these 300 genes revealed 7 enriched pathways belonging to neuronal function/synaptic plasticity, signal transduction, inflammation/immune function, and oxidative stress response categories. No pathways were enriched for 72 genes exclusively detected in both week 8 and week 12 placebo groups.ConclusionThis secondary analysis study of gene expression data from a TPM clinical trial not only yielded consistent results with those of primary

  17. Discovery of transcription factors and regulatory regions driving in vivo tumor development by ATAC-seq and FAIRE-seq open chromatin profiling.

    Directory of Open Access Journals (Sweden)

    Kristofer Davie

    2015-02-01

    Full Text Available Genomic enhancers regulate spatio-temporal gene expression by recruiting specific combinations of transcription factors (TFs. When TFs are bound to active regulatory regions, they displace canonical nucleosomes, making these regions biochemically detectable as nucleosome-depleted regions or accessible/open chromatin. Here we ask whether open chromatin profiling can be used to identify the entire repertoire of active promoters and enhancers underlying tissue-specific gene expression during normal development and oncogenesis in vivo. To this end, we first compare two different approaches to detect open chromatin in vivo using the Drosophila eye primordium as a model system: FAIRE-seq, based on physical separation of open versus closed chromatin; and ATAC-seq, based on preferential integration of a transposon into open chromatin. We find that both methods reproducibly capture the tissue-specific chromatin activity of regulatory regions, including promoters, enhancers, and insulators. Using both techniques, we screened for regulatory regions that become ectopically active during Ras-dependent oncogenesis, and identified 3778 regions that become (over-activated during tumor development. Next, we applied motif discovery to search for candidate transcription factors that could bind these regions and identified AP-1 and Stat92E as key regulators. We validated the importance of Stat92E in the development of the tumors by introducing a loss of function Stat92E mutant, which was sufficient to rescue the tumor phenotype. Additionally we tested if the predicted Stat92E responsive regulatory regions are genuine, using ectopic induction of JAK/STAT signaling in developing eye discs, and observed that similar chromatin changes indeed occurred. Finally, we determine that these are functionally significant regulatory changes, as nearby target genes are up- or down-regulated. In conclusion, we show that FAIRE-seq and ATAC-seq based open chromatin profiling

  18. NASA Reverb: Standards-Driven Earth Science Data and Service Discovery

    Science.gov (United States)

    Cechini, M. F.; Mitchell, A.; Pilone, D.

    2011-12-01

    NASA's Earth Observing System Data and Information System (EOSDIS) is a core capability in NASA's Earth Science Data Systems Program. NASA's EOS ClearingHOuse (ECHO) is a metadata catalog for the EOSDIS, providing a centralized catalog of data products and registry of related data services. Working closely with the EOSDIS community, the ECHO team identified a need to develop the next generation EOS data and service discovery tool. This development effort relied on the following principles: + Metadata Driven User Interface - Users should be presented with data and service discovery capabilities based on dynamic processing of metadata describing the targeted data. + Integrated Data & Service Discovery - Users should be able to discovery data and associated data services that facilitate their research objectives. + Leverage Common Standards - Users should be able to discover and invoke services that utilize common interface standards. Metadata plays a vital role facilitating data discovery and access. As data providers enhance their metadata, more advanced search capabilities become available enriching a user's search experience. Maturing metadata formats such as ISO 19115 provide the necessary depth of metadata that facilitates advanced data discovery capabilities. Data discovery and access is not limited to simply the retrieval of data granules, but is growing into the more complex discovery of data services. These services include, but are not limited to, services facilitating additional data discovery, subsetting, reformatting, and re-projecting. The discovery and invocation of these data services is made significantly simpler through the use of consistent and interoperable standards. By utilizing an adopted standard, developing standard-specific adapters can be utilized to communicate with multiple services implementing a specific protocol. The emergence of metadata standards such as ISO 19119 plays a similarly important role in discovery as the 19115 standard

  19. First discovery of two polyketide synthase genes for mitorubrinic acid and mitorubrinol yellow pigment biosynthesis and implications in virulence of Penicillium marneffei.

    Directory of Open Access Journals (Sweden)

    Patrick C Y Woo

    Full Text Available BACKGROUND: The genome of P. marneffei, the most important thermal dimorphic fungus causing respiratory, skin and systemic mycosis in China and Southeast Asia, possesses 23 polyketide synthase (PKS genes and 2 polyketide synthase nonribosomal peptide synthase hybrid (PKS-NRPS genes, which is of high diversity compared to other thermal dimorphic pathogenic fungi. We hypothesized that the yellow pigment in the mold form of P. marneffei could also be synthesized by one or more PKS genes. METHODOLOGY/PRINCIPAL FINDINGS: All 23 PKS and 2 PKS-NRPS genes of P. marneffei were systematically knocked down. A loss of the yellow pigment was observed in the mold form of the pks11 knockdown, pks12 knockdown and pks11pks12 double knockdown mutants. Sequence analysis showed that PKS11 and PKS12 are fungal non-reducing PKSs. Ultra high performance liquid chromatography-photodiode array detector/electrospray ionization-quadruple time of flight-mass spectrometry (MS and MS/MS analysis of the culture filtrates of wild type P. marneffei and the pks11 knockdown, pks12 knockdown and pks11pks12 double knockdown mutants showed that the yellow pigment is composed of mitorubrinic acid and mitorubrinol. The survival of mice challenged with the pks11 knockdown, pks12 knockdown and pks11pks12 double knockdown mutants was significantly better than those challenged with wild type P. marneffei (P<0.05. There was also statistically significant decrease in survival of pks11 knockdown, pks12 knockdown and pks11pks12 double knockdown mutants compared to wild type P. marneffei in both J774 and THP1 macrophages (P<0.05. CONCLUSIONS/SIGNIFICANCE: The yellow pigment of the mold form of P. marneffei is composed of mitorubrinol and mitorubrinic acid. This represents the first discovery of PKS genes responsible for mitorubrinol and mitorubrinic acid biosynthesis. pks12 and pks11 are probably responsible for sequential use in the biosynthesis of mitorubrinol and mitorubrinic acid

  20. 19 CFR 354.10 - Discovery.

    Science.gov (United States)

    2010-04-01

    ... 19 Customs Duties 3 2010-04-01 2010-04-01 false Discovery. 354.10 Section 354.10 Customs Duties... ANTIDUMPING OR COUNTERVAILING DUTY ADMINISTRATIVE PROTECTIVE ORDER § 354.10 Discovery. (a) Voluntary discovery. All parties are encouraged to engage in voluntary discovery procedures regarding any matter, not...

  1. 36 CFR 1150.63 - Discovery.

    Science.gov (United States)

    2010-07-01

    ... 36 Parks, Forests, and Public Property 3 2010-07-01 2010-07-01 false Discovery. 1150.63 Section... PRACTICE AND PROCEDURES FOR COMPLIANCE HEARINGS Prehearing Conferences and Discovery § 1150.63 Discovery. (a) Parties are encouraged to engage in voluntary discovery procedures. For good cause shown under...

  2. 37 CFR 11.52 - Discovery.

    Science.gov (United States)

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Discovery. 11.52 Section 11... Disciplinary Proceedings; Jurisdiction, Sanctions, Investigations, and Proceedings § 11.52 Discovery. Discovery... establishes that discovery is reasonable and relevant, the hearing officer, under such conditions as he or she...

  3. Pattern Discovery in Time-Ordered Data; TOPICAL

    International Nuclear Information System (INIS)

    CONRAD, GREGORY N.; BRITANIK, JOHN M.; DELAND, SHARON M.; JENKIN, CHRISTINA L.

    2002-01-01

    This report describes the results of a Laboratory-Directed Research and Development project on techniques for pattern discovery in discrete event time series data. In this project, we explored two different aspects of the pattern matching/discovery problem. The first aspect studied was the use of Dynamic Time Warping for pattern matching in continuous data. In essence, DTW is a technique for aligning time series along the time axis to optimize the similarity measure. The second aspect studied was techniques for discovering patterns in discrete event data. We developed a pattern discovery tool based on adaptations of the A-priori and GSP (Generalized Sequential Pattern mining) algorithms. We then used the tool on three different application areas-unattended monitoring system data from a storage magazine, computer network intrusion detection, and analysis of robot training data

  4. Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers.

    Science.gov (United States)

    Labaj, Wojciech; Papiez, Anna; Polanski, Andrzej; Polanska, Joanna

    2017-03-01

    Large collections of data in studies on cancer such as leukaemia provoke the necessity of applying tailored analysis algorithms to ensure supreme information extraction. In this work, a custom-fit pipeline is demonstrated for thorough investigation of the voluminous MILE gene expression data set. Three analyses are accomplished, each for gaining a deeper understanding of the processes underlying leukaemia types and subtypes. First, the main disease groups are tested for differential expression against the healthy control as in a standard case-control study. Here, the basic knowledge on molecular mechanisms is confirmed quantitatively and by literature references. Second, pairwise comparison testing is performed for juxtaposing the main leukaemia types among each other. In this case by means of the Dice coefficient similarity measure the general relations are pointed out. Moreover, lists of candidate main leukaemia group biomarkers are proposed. Finally, with this approach being successful, the third analysis provides insight into all of the studied subtypes, followed by the emergence of four leukaemia subtype biomarkers. In addition, the class enhanced DEG signature obtained on the basis of novel pipeline processing leads to significantly better classification power of multi-class data classifiers. The developed methodology consisting of batch effect adjustment, adaptive noise and feature filtration coupled with adequate statistical testing and biomarker definition proves to be an effective approach towards knowledge discovery in high-throughput molecular biology experiments.

  5. Targeted discovery of glycoside hydrolases from a switchgrass-adapted compost community

    Energy Technology Data Exchange (ETDEWEB)

    Allgaier, M.; Reddy, A.; Park, J. I.; Ivanova, N.; D' haeseleer, P.; Lowry, S.; Sapra, R.; Hazen, T.C.; Simmons, B.A.; VanderGheynst, J. S.; Hugenholtz, P.

    2009-11-15

    Development of cellulosic biofuels from non-food crops is currently an area of intense research interest. Tailoring depolymerizing enzymes to particular feedstocks and pretreatment conditions is one promising avenue of research in this area. Here we added a green-waste compost inoculum to switchgrass (Panicum virgatum) and simulated thermophilic composting in a bioreactor to select for a switchgrass-adapted community and to facilitate targeted discovery of glycoside hydrolases. Small-subunit (SSU) rRNA-based community profiles revealed that the microbial community changed dramatically between the initial and switchgrass-adapted compost (SAC) with some bacterial populations being enriched over 20-fold. We obtained 225 Mbp of 454-titanium pyrosequence data from the SAC community and conservatively identified 800 genes encoding glycoside hydrolase domains that were biased toward depolymerizing grass cell wall components. Of these, {approx}10% were putative cellulases mostly belonging to families GH5 and GH9. We synthesized two SAC GH9 genes with codon optimization for heterologous expression in Escherichia coli and observed activity for one on carboxymethyl cellulose. The active GH9 enzyme has a temperature optimum of 50 C and pH range of 5.5 to 8 consistent with the composting conditions applied. We demonstrate that microbial communities adapt to switchgrass decomposition using simulated composting condition and that full-length genes can be identified from complex metagenomic sequence data, synthesized and expressed resulting in active enzyme.

  6. Targeted Discovery of Glycoside Hydrolases from a Switchgrass-Adapted Compost Community

    Energy Technology Data Exchange (ETDEWEB)

    Reddy, Amitha; Allgaier, Martin; Park, Joshua I.; Ivanoval, Natalia; Dhaeseleer, Patrik; Lowry, Steve; Sapra, Rajat; Hazen, Terry C.; Simmons, Blake A.; VanderGheynst, Jean S.; Hugenholtz, Philip

    2011-05-11

    Development of cellulosic biofuels from non-food crops is currently an area of intense research interest. Tailoring depolymerizing enzymes to particular feedstocks and pretreatment conditions is one promising avenue of research in this area. Here we added a green-waste compost inoculum to switchgrass (Panicum virgatum) and simulated thermophilic composting in a bioreactor to select for a switchgrass-adapted community and to facilitate targeted discovery of glycoside hydrolases. Smallsubunit (SSU) rRNA-based community profiles revealed that the microbial community changed dramatically between the initial and switchgrass-adapted compost (SAC) with some bacterial populations being enriched over 20-fold. We obtained 225 Mbp of 454-titanium pyrosequence data from the SAC community and conservatively identified 800 genes encoding glycoside hydrolase domains that were biased toward depolymerizing grass cell wall components. Of these, ,10percent were putative cellulasesmostly belonging to families GH5 and GH9. We synthesized two SAC GH9 genes with codon optimization for heterologous expression in Escherichia coli and observed activity for one on carboxymethyl cellulose. The active GH9 enzyme has a temperature optimum of 50uC and pH range of 5.5 to 8 consistent with the composting conditions applied. We demonstrate that microbial communities adapt to switchgrass decomposition using simulated composting condition and that full-length genes can be identified from complex metagenomic sequence data, synthesized and expressed resulting in active enzyme.

  7. Usability of Discovery Portals

    NARCIS (Netherlands)

    Bulens, J.D.; Vullings, L.A.E.; Houtkamp, J.M.; Vanmeulebrouk, B.

    2013-01-01

    As INSPIRE progresses to be implemented in the EU, many new discovery portals are built to facilitate finding spatial data. Currently the structure of the discovery portals is determined by the way spatial data experts like to work. However, we argue that the main target group for discovery portals

  8. The Tangled Tale of Genes and Environment: Moore's The Dependent Gene: The Fallacy of “nature VS. Nurture”

    Science.gov (United States)

    Schneider, Susan M

    2007-01-01

    Nature–nurture views that smack of genetic determinism remain prevalent. Yet, the increasing knowledge base shows ever more clearly that environmental factors and genes form a fully interactional system at all levels. Moore's book covers the major topics of discovery and dispute, including behavior genetics and the twin studies, developmental psychobiology, and developmental systems theory. Knowledge of this larger life-sciences context for behavior principles will become increasingly important as the full complexity of gene–environment relations is revealed. Behavior analysis both contributes to and gains from the larger battle for the recognition of how nature and nurture really work.

  9. 14 CFR 16.213 - Discovery.

    Science.gov (United States)

    2010-01-01

    ... 14 Aeronautics and Space 1 2010-01-01 2010-01-01 false Discovery. 16.213 Section 16.213... PRACTICE FOR FEDERALLY-ASSISTED AIRPORT ENFORCEMENT PROCEEDINGS Hearings § 16.213 Discovery. (a) Discovery... discovery permitted by this section if a party shows that— (1) The information requested is cumulative or...

  10. 28 CFR 76.21 - Discovery.

    Science.gov (United States)

    2010-07-01

    ... 28 Judicial Administration 2 2010-07-01 2010-07-01 false Discovery. 76.21 Section 76.21 Judicial... POSSESSION OF CERTAIN CONTROLLED SUBSTANCES § 76.21 Discovery. (a) Scope. Discovery under this part covers... as a general guide for discovery practices in proceedings before the Judge. However, unless otherwise...

  11. Two-Way Gene Interaction From Microarray Data Based on Correlation Methods.

    Science.gov (United States)

    Alavi Majd, Hamid; Talebi, Atefeh; Gilany, Kambiz; Khayyer, Nasibeh

    2016-06-01

    Gene networks have generated a massive explosion in the development of high-throughput techniques for monitoring various aspects of gene activity. Networks offer a natural way to model interactions between genes, and extracting gene network information from high-throughput genomic data is an important and difficult task. The purpose of this study is to construct a two-way gene network based on parametric and nonparametric correlation coefficients. The first step in constructing a Gene Co-expression Network is to score all pairs of gene vectors. The second step is to select a score threshold and connect all gene pairs whose scores exceed this value. In the foundation-application study, we constructed two-way gene networks using nonparametric methods, such as Spearman's rank correlation coefficient and Blomqvist's measure, and compared them with Pearson's correlation coefficient. We surveyed six genes of venous thrombosis disease, made a matrix entry representing the score for the corresponding gene pair, and obtained two-way interactions using Pearson's correlation, Spearman's rank correlation, and Blomqvist's coefficient. Finally, these methods were compared with Cytoscape, based on BIND, and Gene Ontology, based on molecular function visual methods; R software version 3.2 and Bioconductor were used to perform these methods. Based on the Pearson and Spearman correlations, the results were the same and were confirmed by Cytoscape and GO visual methods; however, Blomqvist's coefficient was not confirmed by visual methods. Some results of the correlation coefficients are not the same with visualization. The reason may be due to the small number of data.

  12. Mass spectrometry for protein quantification in biomarker discovery.

    Science.gov (United States)

    Wang, Mu; You, Jinsam

    2012-01-01

    Major technological advances have made proteomics an extremely active field for biomarker discovery in recent years due primarily to the development of newer mass spectrometric technologies and the explosion in genomic and protein bioinformatics. This leads to an increased emphasis on larger scale, faster, and more efficient methods for detecting protein biomarkers in human tissues, cells, and biofluids. Most current proteomic methodologies for biomarker discovery, however, are not highly automated and are generally labor-intensive and expensive. More automation and improved software programs capable of handling a large amount of data are essential to reduce the cost of discovery and to increase throughput. In this chapter, we discuss and describe mass spectrometry-based proteomic methods for quantitative protein analysis.

  13. Web-based services for drug design and discovery.

    Science.gov (United States)

    Frey, Jeremy G; Bird, Colin L

    2011-09-01

    Reviews of the development of drug discovery through the 20(th) century recognised the importance of chemistry and increasingly bioinformatics, but had relatively little to say about the importance of computing and networked computing in particular. However, the design and discovery of new drugs is arguably the most significant single application of bioinformatics and cheminformatics to have benefitted from the increases in the range and power of the computational techniques since the emergence of the World Wide Web, commonly now referred to as simply 'the Web'. Web services have enabled researchers to access shared resources and to deploy standardized calculations in their search for new drugs. This article first considers the fundamental principles of Web services and workflows, and then explores the facilities and resources that have evolved to meet the specific needs of chem- and bio-informatics. This strategy leads to a more detailed examination of the basic components that characterise molecules and the essential predictive techniques, followed by a discussion of the emerging networked services that transcend the basic provisions, and the growing trend towards embracing modern techniques, in particular the Semantic Web. In the opinion of the authors, the issues that require community action are: increasing the amount of chemical data available for open access; validating the data as provided; and developing more efficient links between the worlds of cheminformatics and bioinformatics. The goal is to create ever better drug design services.

  14. 40 CFR 27.21 - Discovery.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 1 2010-07-01 2010-07-01 false Discovery. 27.21 Section 27.21... Discovery. (a) The following types of discovery are authorized: (1) Requests for production of documents for..., discovery is available only as ordered by the presiding officer. The presiding officer shall regulate the...

  15. 37 CFR 41.150 - Discovery.

    Science.gov (United States)

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Discovery. 41.150 Section 41... COMMERCE PRACTICE BEFORE THE BOARD OF PATENT APPEALS AND INTERFERENCES Contested Cases § 41.150 Discovery. (a) Limited discovery. A party is not entitled to discovery except as authorized in this subpart. The...

  16. 14 CFR 13.220 - Discovery.

    Science.gov (United States)

    2010-01-01

    ... 14 Aeronautics and Space 1 2010-01-01 2010-01-01 false Discovery. 13.220 Section 13.220... INVESTIGATIVE AND ENFORCEMENT PROCEDURES Rules of Practice in FAA Civil Penalty Actions § 13.220 Discovery. (a) Initiation of discovery. Any party may initiate discovery described in this section, without the consent or...

  17. 49 CFR 604.38 - Discovery.

    Science.gov (United States)

    2010-10-01

    ... 49 Transportation 7 2010-10-01 2010-10-01 false Discovery. 604.38 Section 604.38 Transportation... TRANSPORTATION CHARTER SERVICE Hearings. § 604.38 Discovery. (a) Permissible forms of discovery shall be within the discretion of the PO. (b) The PO shall limit the frequency and extent of discovery permitted by...

  18. 15 CFR 719.10 - Discovery.

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Discovery. 719.10 Section 719.10... Discovery. (a) General. The parties are encouraged to engage in voluntary discovery regarding any matter... the Federal Rules of Civil Procedure relating to discovery apply to the extent consistent with this...

  19. 24 CFR 26.18 - Discovery.

    Science.gov (United States)

    2010-04-01

    ... 24 Housing and Urban Development 1 2010-04-01 2010-04-01 false Discovery. 26.18 Section 26.18... PROCEDURES Hearings Before Hearing Officers Discovery § 26.18 Discovery. (a) General. The parties are encouraged to engage in voluntary discovery procedures, which may commence at any time after an answer has...

  20. 42 CFR 426.532 - Discovery.

    Science.gov (United States)

    2010-10-01

    ... 42 Public Health 3 2010-10-01 2010-10-01 false Discovery. 426.532 Section 426.532 Public Health... § 426.532 Discovery. (a) General rule. If the Board orders discovery, the Board must establish a reasonable timeframe for discovery. (b) Protective order—(1) Request for a protective order. Any party...