WorldWideScience

Sample records for text-mining assisted regulatory

  1. Text-mining-assisted biocuration workflows in Argo

    Science.gov (United States)

    Rak, Rafal; Batista-Navarro, Riza Theresa; Rowley, Andrew; Carter, Jacob; Ananiadou, Sophia

    2014-01-01

    Biocuration activities have been broadly categorized into the selection of relevant documents, the annotation of biological concepts of interest and identification of interactions between the concepts. Text mining has been shown to have a potential to significantly reduce the effort of biocurators in all the three activities, and various semi-automatic methodologies have been integrated into curation pipelines to support them. We investigate the suitability of Argo, a workbench for building text-mining solutions with the use of a rich graphical user interface, for the process of biocuration. Central to Argo are customizable workflows that users compose by arranging available elementary analytics to form task-specific processing units. A built-in manual annotation editor is the single most used biocuration tool of the workbench, as it allows users to create annotations directly in text, as well as modify or delete annotations created by automatic processing components. Apart from syntactic and semantic analytics, the ever-growing library of components includes several data readers and consumers that support well-established as well as emerging data interchange formats such as XMI, RDF and BioC, which facilitate the interoperability of Argo with other platforms or resources. To validate the suitability of Argo for curation activities, we participated in the BioCreative IV challenge whose purpose was to evaluate Web-based systems addressing user-defined biocuration tasks. Argo proved to have the edge over other systems in terms of flexibility of defining biocuration tasks. As expected, the versatility of the workbench inevitably lengthened the time the curators spent on learning the system before taking on the task, which may have affected the usability of Argo. The participation in the challenge gave us an opportunity to gather valuable feedback and identify areas of improvement, some of which have already been introduced. Database URL: http://argo.nactem.ac.uk PMID

  2. Text Mining.

    Science.gov (United States)

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  3. Ion Channel ElectroPhysiology Ontology (ICEPO) - a case study of text mining assisted ontology development.

    Science.gov (United States)

    Elayavilli, Ravikumar Komandur; Liu, Hongfang

    2016-01-01

    Computational modeling of biological cascades is of great interest to quantitative biologists. Biomedical text has been a rich source for quantitative information. Gathering quantitative parameters and values from biomedical text is one significant challenge in the early steps of computational modeling as it involves huge manual effort. While automatically extracting such quantitative information from bio-medical text may offer some relief, lack of ontological representation for a subdomain serves as impedance in normalizing textual extractions to a standard representation. This may render textual extractions less meaningful to the domain experts. In this work, we propose a rule-based approach to automatically extract relations involving quantitative data from biomedical text describing ion channel electrophysiology. We further translated the quantitative assertions extracted through text mining to a formal representation that may help in constructing ontology for ion channel events using a rule based approach. We have developed Ion Channel ElectroPhysiology Ontology (ICEPO) by integrating the information represented in closely related ontologies such as, Cell Physiology Ontology (CPO), and Cardiac Electro Physiology Ontology (CPEO) and the knowledge provided by domain experts. The rule-based system achieved an overall F-measure of 68.93% in extracting the quantitative data assertions system on an independently annotated blind data set. We further made an initial attempt in formalizing the quantitative data assertions extracted from the biomedical text into a formal representation that offers potential to facilitate the integration of text mining into ontological workflow, a novel aspect of this study. This work is a case study where we created a platform that provides formal interaction between ontology development and text mining. We have achieved partial success in extracting quantitative assertions from the biomedical text and formalizing them in ontological

  4. [Text mining, a method for computer-assisted analysis of scientific texts, demonstrated by an analysis of author networks].

    Science.gov (United States)

    Hahn, P; Dullweber, F; Unglaub, F; Spies, C K

    2014-06-01

    Searching for relevant publications is becoming more difficult with the increasing number of scientific articles. Text mining as a specific form of computer-based data analysis may be helpful in this context. Highlighting relations between authors and finding relevant publications concerning a specific subject using text analysis programs are illustrated graphically by 2 performed examples. © Georg Thieme Verlag KG Stuttgart · New York.

  5. ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials

    Science.gov (United States)

    2012-01-01

    Clinical trials are mandatory protocols describing medical research on humans and among the most valuable sources of medical practice evidence. Searching for trials relevant to some query is laborious due to the immense number of existing protocols. Apart from search, writing new trials includes composing detailed eligibility criteria, which might be time-consuming, especially for new researchers. In this paper we present ASCOT, an efficient search application customised for clinical trials. ASCOT uses text mining and data mining methods to enrich clinical trials with metadata, that in turn serve as effective tools to narrow down search. In addition, ASCOT integrates a component for recommending eligibility criteria based on a set of selected protocols. PMID:22595088

  6. Contextual Text Mining

    Science.gov (United States)

    Mei, Qiaozhu

    2009-01-01

    With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…

  7. tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles.

    Science.gov (United States)

    Cejuela, Juan Miguel; McQuilton, Peter; Ponting, Laura; Marygold, Steven J; Stefancsik, Raymund; Millburn, Gillian H; Rost, Burkhard

    2014-01-01

    The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the 'tagtog' system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation. DATABASE URL: www.tagtog.net, www.flybase.org.

  8. Text Mining Applications and Theory

    CERN Document Server

    Berry, Michael W

    2010-01-01

    Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives.  The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrate the use of techniques from machine learning, knowledge discovery, natural language processing and information retrieval to design computational models for automated text analysis and mining. This volume demonstrates how advancements in the fields of applied mathematics, computer science, machine learning

  9. Text Mining in Organizational Research.

    Science.gov (United States)

    Kobayashi, Vladimer B; Mol, Stefan T; Berkers, Hannah A; Kismihók, Gábor; Den Hartog, Deanne N

    2018-07-01

    Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies.

  10. Biomarker Identification Using Text Mining

    Directory of Open Access Journals (Sweden)

    Hui Li

    2012-01-01

    Full Text Available Identifying molecular biomarkers has become one of the important tasks for scientists to assess the different phenotypic states of cells or organisms correlated to the genotypes of diseases from large-scale biological data. In this paper, we proposed a text-mining-based method to discover biomarkers from PubMed. First, we construct a database based on a dictionary, and then we used a finite state machine to identify the biomarkers. Our method of text mining provides a highly reliable approach to discover the biomarkers in the PubMed database.

  11. SIAM 2007 Text Mining Competition dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  12. Text mining by Tsallis entropy

    Science.gov (United States)

    Jamaati, Maryam; Mehri, Ali

    2018-01-01

    Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.

  13. GPU-Accelerated Text Mining

    International Nuclear Information System (INIS)

    Cui, X.; Mueller, F.; Zhang, Y.; Potok, Thomas E.

    2009-01-01

    Accelerating hardware devices represent a novel promise for improving the performance for many problem domains but it is not clear for which domains what accelerators are suitable. While there is no room in general-purpose processor design to significantly increase the processor frequency, developers are instead resorting to multi-core chips duplicating conventional computing capabilities on a single die. Yet, accelerators offer more radical designs with a much higher level of parallelism and novel programming environments. This present work assesses the viability of text mining on CUDA. Text mining is one of the key concepts that has become prominent as an effective means to index the Internet, but its applications range beyond this scope and extend to providing document similarity metrics, the subject of this work. We have developed and optimized text search algorithms for GPUs to exploit their potential for massive data processing. We discuss the algorithmic challenges of parallelization for text search problems on GPUs and demonstrate the potential of these devices in experiments by reporting significant speedups. Our study may be one of the first to assess more complex text search problems for suitability for GPU devices, and it may also be one of the first to exploit and report on atomic instruction usage that have recently become available in NVIDIA devices

  14. Text Mining for Protein Docking.

    Directory of Open Access Journals (Sweden)

    Varsha D Badal

    2015-12-01

    Full Text Available The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking. Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu. The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound

  15. Biomedical text mining and its applications in cancer research.

    Science.gov (United States)

    Zhu, Fei; Patumcharoenpol, Preecha; Zhang, Cheng; Yang, Yang; Chan, Jonathan; Meechai, Asawin; Vongsangnak, Wanwipa; Shen, Bairong

    2013-04-01

    Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over 100years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research. Copyright © 2012 Elsevier Inc. All rights reserved.

  16. Science and Technology Text Mining Basic Concepts

    National Research Council Canada - National Science Library

    Losiewicz, Paul

    2003-01-01

    ...). It then presents some of the most widely used data and text mining techniques, including clustering and classification methods, such as nearest neighbor, relational learning models, and genetic...

  17. Text mining for the biocuration workflow.

    Science.gov (United States)

    Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.

  18. Frontiers of biomedical text mining: current progress

    Science.gov (United States)

    Zweigenbaum, Pierre; Demner-Fushman, Dina; Yu, Hong; Cohen, Kevin B.

    2008-01-01

    It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation-handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a number of problems at the frontiers of biomedical text mining continue to present interesting challenges and opportunities for great improvements and interesting research. In this article we review the current state of the art in biomedical text mining or ‘BioNLP’ in general, focusing primarily on papers published within the past year. PMID:17977867

  19. Text Mining of Supreme Administrative Court Jurisdictions

    OpenAIRE

    Feinerer, Ingo; Hornik, Kurt

    2007-01-01

    Within the last decade text mining, i.e., extracting sensitive information from text corpora, has become a major factor in business intelligence. The automated textual analysis of law corpora is highly valuable because of its impact on a company's legal options and the raw amount of available jurisdiction. The study of supreme court jurisdiction and international law corpora is equally important due to its effects on business sectors. In this paper we use text mining methods to investigate Au...

  20. Text mining resources for the life sciences.

    Science.gov (United States)

    Przybyła, Piotr; Shardlow, Matthew; Aubin, Sophie; Bossy, Robert; Eckart de Castilho, Richard; Piperidis, Stelios; McNaught, John; Ananiadou, Sophia

    2016-01-01

    Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable-those that have the crucial ability to share information, enabling smooth integration and reusability. © The Author(s) 2016. Published by Oxford University Press.

  1. Chapter 16: text mining for translational bioinformatics.

    Science.gov (United States)

    Cohen, K Bretonnel; Hunter, Lawrence E

    2013-04-01

    Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

  2. Text mining resources for the life sciences

    Science.gov (United States)

    Shardlow, Matthew; Aubin, Sophie; Bossy, Robert; Eckart de Castilho, Richard; Piperidis, Stelios; McNaught, John; Ananiadou, Sophia

    2016-01-01

    Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable—those that have the crucial ability to share information, enabling smooth integration and reusability. PMID:27888231

  3. Text mining for the biocuration workflow

    Science.gov (United States)

    Hirschman, Lynette; Burns, Gully A. P. C; Krallinger, Martin; Arighi, Cecilia; Cohen, K. Bretonnel; Valencia, Alfonso; Wu, Cathy H.; Chatr-Aryamontri, Andrew; Dowell, Karen G.; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G.

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129

  4. Benchmarking infrastructure for mutation text mining.

    Science.gov (United States)

    Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

    2014-02-25

    Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.

  5. Benchmarking infrastructure for mutation text mining

    Science.gov (United States)

    2014-01-01

    Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600

  6. CONAN : Text Mining in the Biomedical Domain

    NARCIS (Netherlands)

    Malik, R.

    2006-01-01

    This thesis is about Text Mining. Extracting important information from literature. In the last years, the number of biomedical articles and journals is growing exponentially. Scientists might not find the information they want because of the large number of publications. Therefore a system was

  7. Text mining patents for biomedical knowledge.

    Science.gov (United States)

    Rodriguez-Esteban, Raul; Bundschus, Markus

    2016-06-01

    Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Using ontology network structure in text mining.

    Science.gov (United States)

    Berndt, Donald J; McCart, James A; Luther, Stephen L

    2010-11-13

    Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.

  9. Financial Statement Fraud Detection using Text Mining

    OpenAIRE

    Rajan Gupta; Nasib Singh Gill

    2013-01-01

    Data mining techniques have been used enormously by the researchers’ community in detecting financial statement fraud. Most of the research in this direction has used the numbers (quantitative information) i.e. financial ratios present in the financial statements for detecting fraud. There is very little or no research on the analysis of text such as auditor’s comments or notes present in published reports. In this study we propose a text mining approach for detecting financial statement frau...

  10. A Customizable Text Classifier for Text Mining

    Directory of Open Access Journals (Sweden)

    Yun-liang Zhang

    2007-12-01

    Full Text Available Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically or be adjusted by user. The user can also control the number of domains chosen and decide the standard with which to choose the texts based on demand and abundance of materials. The performance of the classifier varies with the user's choice.

  11. Text mining with R a tidy approach

    CERN Document Server

    Silge, Julia

    2017-01-01

    Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You'll learn how tidytext and other tidy tools in R can make text analysis easier and more effective. The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You'll also learn how to integrate natural language processing (NLP) into effective workflows. Practical code examples and data explorations will help you generate real insights from literature, news, and social media. Learn how to apply the tidy text format to NLP Use sentiment analysis to mine the emotional content of text Identify a document's most important terms with frequency measurements E...

  12. Text Mining Metal-Organic Framework Papers.

    Science.gov (United States)

    Park, Sanghoon; Kim, Baekjun; Choi, Sihoon; Boyd, Peter G; Smit, Berend; Kim, Jihan

    2018-02-26

    We have developed a simple text mining algorithm that allows us to identify surface area and pore volumes of metal-organic frameworks (MOFs) using manuscript html files as inputs. The algorithm searches for common units (e.g., m 2 /g, cm 3 /g) associated with these two quantities to facilitate the search. From the sample set data of over 200 MOFs, the algorithm managed to identify 90% and 88.8% of the correct surface area and pore volume values. Further application to a test set of randomly chosen MOF html files yielded 73.2% and 85.1% accuracies for the two respective quantities. Most of the errors stem from unorthodox sentence structures that made it difficult to identify the correct data as well as bolded notations of MOFs (e.g., 1a) that made it difficult identify its real name. These types of tools will become useful when it comes to discovering structure-property relationships among MOFs as well as collecting a large set of data for references.

  13. Text Mining the History of Medicine.

    Science.gov (United States)

    Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

    2016-01-01

    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while

  14. Unsupervised text mining for assessing and augmenting GWAS results.

    Science.gov (United States)

    Ailem, Melissa; Role, François; Nadif, Mohamed; Demenais, Florence

    2016-04-01

    Text mining can assist in the analysis and interpretation of large-scale biomedical data, helping biologists to quickly and cheaply gain confirmation of hypothesized relationships between biological entities. We set this question in the context of genome-wide association studies (GWAS), an actively emerging field that contributed to identify many genes associated with multifactorial diseases. These studies allow to identify groups of genes associated with the same phenotype, but provide no information about the relationships between these genes. Therefore, our objective is to leverage unsupervised text mining techniques using text-based cosine similarity comparisons and clustering applied to candidate and random gene vectors, in order to augment the GWAS results. We propose a generic framework which we used to characterize the relationships between 10 genes reported associated with asthma by a previous GWAS. The results of this experiment showed that the similarities between these 10 genes were significantly stronger than would be expected by chance (one-sided p-value<0.01). The clustering of observed and randomly selected gene also allowed to generate hypotheses about potential functional relationships between these genes and thus contributed to the discovery of new candidate genes for asthma. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. PubRunner: A light-weight framework for updating text mining results.

    Science.gov (United States)

    Anekalla, Kishore R; Courneya, J P; Fiorini, Nicolas; Lever, Jake; Muchow, Michael; Busby, Ben

    2017-01-01

    Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are underused because their results are static and do not reflect the constantly expanding knowledge in the field. In order for biomedical text mining to become an indispensable tool used by researchers, this problem must be addressed. To this end, we present PubRunner, a framework for regularly running text mining tools on the latest publications. PubRunner is lightweight, simple to use, and can be integrated with an existing text mining tool. The workflow involves downloading the latest abstracts from PubMed, executing a user-defined tool, pushing the resulting data to a public FTP or Zenodo dataset, and publicizing the location of these results on the public PubRunner website. We illustrate the use of this tool by re-running the commonly used word2vec tool on the latest PubMed abstracts to generate up-to-date word vector representations for the biomedical domain. This shows a proof of concept that we hope will encourage text mining developers to build tools that truly will aid biologists in exploring the latest publications.

  16. Text Mining in Biomedical Domain with Emphasis on Document Clustering.

    Science.gov (United States)

    Renganathan, Vinaitheerthan

    2017-07-01

    With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.

  17. Text mining from ontology learning to automated text processing applications

    CERN Document Server

    Biemann, Chris

    2014-01-01

    This book comprises a set of articles that specify the methodology of text mining, describe the creation of lexical resources in the framework of text mining and use text mining for various tasks in natural language processing (NLP). The analysis of large amounts of textual data is a prerequisite to build lexical resources such as dictionaries and ontologies and also has direct applications in automated text processing in fields such as history, healthcare and mobile applications, just to name a few. This volume gives an update in terms of the recent gains in text mining methods and reflects

  18. Text mining in livestock animal science: introducing the potential of text mining to animal sciences.

    Science.gov (United States)

    Sahadevan, S; Hofmann-Apitius, M; Schellander, K; Tesfaye, D; Fluck, J; Friedrich, C M

    2012-10-01

    In biological research, establishing the prior art by searching and collecting information already present in the domain has equal importance as the experiments done. To obtain a complete overview about the relevant knowledge, researchers mainly rely on 2 major information sources: i) various biological databases and ii) scientific publications in the field. The major difference between the 2 information sources is that information from databases is available, typically well structured and condensed. The information content in scientific literature is vastly unstructured; that is, dispersed among the many different sections of scientific text. The traditional method of information extraction from scientific literature occurs by generating a list of relevant publications in the field of interest and manually scanning these texts for relevant information, which is very time consuming. It is more than likely that in using this "classical" approach the researcher misses some relevant information mentioned in the literature or has to go through biological databases to extract further information. Text mining and named entity recognition methods have already been used in human genomics and related fields as a solution to this problem. These methods can process and extract information from large volumes of scientific text. Text mining is defined as the automatic extraction of previously unknown and potentially useful information from text. Named entity recognition (NER) is defined as the method of identifying named entities (names of real world objects; for example, gene/protein names, drugs, enzymes) in text. In animal sciences, text mining and related methods have been briefly used in murine genomics and associated fields, leaving behind other fields of animal sciences, such as livestock genomics. The aim of this work was to develop an information retrieval platform in the livestock domain focusing on livestock publications and the recognition of relevant data from

  19. Text mining meets workflow: linking U-Compare with Taverna

    Science.gov (United States)

    Kano, Yoshinobu; Dobson, Paul; Nakanishi, Mio; Tsujii, Jun'ichi; Ananiadou, Sophia

    2010-01-01

    Summary: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare to Taverna, a generic workflow system, to expose text mining functionality to the bioinformatics community. Availability: http://u-compare.org/taverna.html, http://u-compare.org Contact: kano@is.s.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20709690

  20. Working with text tools, techniques and approaches for text mining

    CERN Document Server

    Tourte, Gregory J L

    2016-01-01

    Text mining tools and technologies have long been a part of the repository world, where they have been applied to a variety of purposes, from pragmatic aims to support tools. Research areas as diverse as biology, chemistry, sociology and criminology have seen effective use made of text mining technologies. Working With Text collects a subset of the best contributions from the 'Working with text: Tools, techniques and approaches for text mining' workshop, alongside contributions from experts in the area. Text mining tools and technologies in support of academic research include supporting research on the basis of a large body of documents, facilitating access to and reuse of extant work, and bridging between the formal academic world and areas such as traditional and social media. Jisc have funded a number of projects, including NaCTem (the National Centre for Text Mining) and the ResDis programme. Contents are developed from workshop submissions and invited contributions, including: Legal considerations in te...

  1. Cultural text mining: using text mining to map the emergence of transnational reference cultures in public media repositories

    NARCIS (Netherlands)

    Pieters, Toine; Verheul, Jaap

    2014-01-01

    This paper discusses the research project Translantis, which uses innovative technologies for cultural text mining to analyze large repositories of digitized public media, such as newspapers and journals.1 The Translantis research team uses and develops the text mining tool Texcavator, which is

  2. Text mining for biology--the way forward

    DEFF Research Database (Denmark)

    Altman, Russ B; Bergman, Casey M; Blake, Judith

    2008-01-01

    This article collects opinions from leading scientists about how text mining can provide better access to the biological literature, how the scientific community can help with this process, what the next steps are, and what role future BioCreative evaluations can play. The responses identify...... several broad themes, including the possibility of fusing literature and biological databases through text mining; the need for user interfaces tailored to different classes of users and supporting community-based annotation; the importance of scaling text mining technology and inserting it into larger...

  3. Supporting the education evidence portal via text mining

    Science.gov (United States)

    Ananiadou, Sophia; Thompson, Paul; Thomas, James; Mu, Tingting; Oliver, Sandy; Rickinson, Mark; Sasaki, Yutaka; Weissenbacher, Davy; McNaught, John

    2010-01-01

    The UK Education Evidence Portal (eep) provides a single, searchable, point of access to the contents of the websites of 33 organizations relating to education, with the aim of revolutionizing work practices for the education community. Use of the portal alleviates the need to spend time searching multiple resources to find relevant information. However, the combined content of the websites of interest is still very large (over 500 000 documents and growing). This means that searches using the portal can produce very large numbers of hits. As users often have limited time, they would benefit from enhanced methods of performing searches and viewing results, allowing them to drill down to information of interest more efficiently, without having to sift through potentially long lists of irrelevant documents. The Joint Information Systems Committee (JISC)-funded ASSIST project has produced a prototype web interface to demonstrate the applicability of integrating a number of text-mining tools and methods into the eep, to facilitate an enhanced searching, browsing and document-viewing experience. New features include automatic classification of documents according to a taxonomy, automatic clustering of search results according to similar document content, and automatic identification and highlighting of key terms within documents. PMID:20643679

  4. Text mining in cancer gene and pathway prioritization.

    Science.gov (United States)

    Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter

    2014-01-01

    Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.

  5. A STUDY OF TEXT MINING METHODS, APPLICATIONS,AND TECHNIQUES

    OpenAIRE

    R. Rajamani*1 & S. Saranya2

    2017-01-01

    Data mining is used to extract useful information from the large amount of data. It is used to implement and solve different types of research problems. The research related areas in data mining are text mining, web mining, image mining, sequential pattern mining, spatial mining, medical mining, multimedia mining, structure mining and graph mining. Text mining also referred to text of data mining, it is also called knowledge discovery in text (KDT) or knowledge of intelligent text analysis. T...

  6. OntoGene web services for biomedical text mining.

    Science.gov (United States)

    Rinaldi, Fabio; Clematide, Simon; Marques, Hernani; Ellendorff, Tilia; Romacker, Martin; Rodriguez-Esteban, Raul

    2014-01-01

    Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges,with top ranked results in several of them.

  7. Application of text mining in the biomedical domain.

    Science.gov (United States)

    Fleuren, Wilco W M; Alkema, Wynand

    2015-03-01

    In recent years the amount of experimental data that is produced in biomedical research and the number of papers that are being published in this field have grown rapidly. In order to keep up to date with developments in their field of interest and to interpret the outcome of experiments in light of all available literature, researchers turn more and more to the use of automated literature mining. As a consequence, text mining tools have evolved considerably in number and quality and nowadays can be used to address a variety of research questions ranging from de novo drug target discovery to enhanced biological interpretation of the results from high throughput experiments. In this paper we introduce the most important techniques that are used for a text mining and give an overview of the text mining tools that are currently being used and the type of problems they are typically applied for. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. MeSHmap: a text mining tool for MEDLINE.

    OpenAIRE

    Srinivasan, P.

    2001-01-01

    Our research goal is to explore text mining from the metadata included in MEDLINE documents. We present MeSHmap our prototype text mining system that exploits the MeSH indexing accompanying MEDLINE records. MeSHmap supports searches via PubMed followed by user driven exploration of the MeSH terms and subheadings in the retrieved set. The potential of the system goes beyond text retrieval. It may also be used to compare entities of the same type such as pairs of drugs or pairs of procedures et...

  9. SparkText: Biomedical Text Mining on Big Data Framework

    Science.gov (United States)

    He, Karen Y.; Wang, Kai

    2016-01-01

    Background Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. Results In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. Conclusions This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research. PMID:27685652

  10. Text mining in the classification of digital documents

    Directory of Open Access Journals (Sweden)

    Marcial Contreras Barrera

    2016-11-01

    Full Text Available Objective: Develop an automated classifier for the classification of bibliographic material by means of the text mining. Methodology: The text mining is used for the development of the classifier, based on a method of type supervised, conformed by two phases; learning and recognition, in the learning phase, the classifier learns patterns across the analysis of bibliographical records, of the classification Z, belonging to library science, information sciences and information resources, recovered from the database LIBRUNAM, in this phase is obtained the classifier capable of recognizing different subclasses (LC. In the recognition phase the classifier is validated and evaluates across classification tests, for this end bibliographical records of the classification Z are taken randomly, classified by a cataloguer and processed by the automated classifier, in order to obtain the precision of the automated classifier. Results: The application of the text mining achieved the development of the automated classifier, through the method classifying documents supervised type. The precision of the classifier was calculated doing the comparison among the assigned topics manually and automated obtaining 75.70% of precision. Conclusions: The application of text mining facilitated the creation of automated classifier, allowing to obtain useful technology for the classification of bibliographical material with the aim of improving and speed up the process of organizing digital documents.

  11. Text mining of web-based medical content

    CERN Document Server

    Neustein, Amy

    2014-01-01

    Text Mining of Web-Based Medical Content examines web mining for extracting useful information that can be used for treating and monitoring the healthcare of patients. This work provides methodological approaches to designing mapping tools that exploit data found in social media postings. Specific linguistic features of medical postings are analyzed vis-a-vis available data extraction tools for culling useful information.

  12. SparkText: Biomedical Text Mining on Big Data Framework.

    Directory of Open Access Journals (Sweden)

    Zhan Ye

    Full Text Available Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment.In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM, and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes.This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.

  13. Using Text Mining to Characterize Online Discussion Facilitation

    Science.gov (United States)

    Ming, Norma; Baumer, Eric

    2011-01-01

    Facilitating class discussions effectively is a critical yet challenging component of instruction, particularly in online environments where student and faculty interaction is limited. Our goals in this research were to identify facilitation strategies that encourage productive discussion, and to explore text mining techniques that can help…

  14. SparkText: Biomedical Text Mining on Big Data Framework.

    Science.gov (United States)

    Ye, Zhan; Tafti, Ahmad P; He, Karen Y; Wang, Kai; He, Max M

    Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.

  15. The Application of Text Mining in Business Research

    DEFF Research Database (Denmark)

    Preuss, Bjørn

    2017-01-01

    The aim of this paper is to present a methodological concept in business research that has the potential to become one of the most powerful methods in the upcoming years when it comes to research qualitative phenomena in business and society. It presents a selection of algorithms as well elaborat...... on potential use cases for a text mining based approach to qualitative data analysis....

  16. Identifying child abuse through text mining and machine learning

    NARCIS (Netherlands)

    Amrit, Chintan; Paauw, Tim; Aly, Robin; Lavric, Miha

    2017-01-01

    In this paper, we describe how we used text mining and analysis to identify and predict cases of child abuse in a public health institution. Such institutions in the Netherlands try to identify and prevent different kinds of abuse. A significant part of the medical data that the institutions have on

  17. Text Mining of Journal Articles for Sleep Disorder Terminologies.

    Directory of Open Access Journals (Sweden)

    Calvin Lam

    Full Text Available Research on publication trends in journal articles on sleep disorders (SDs and the associated methodologies by using text mining has been limited. The present study involved text mining for terms to determine the publication trends in sleep-related journal articles published during 2000-2013 and to identify associations between SD and methodology terms as well as conducting statistical analyses of the text mining findings.SD and methodology terms were extracted from 3,720 sleep-related journal articles in the PubMed database by using MetaMap. The extracted data set was analyzed using hierarchical cluster analyses and adjusted logistic regression models to investigate publication trends and associations between SD and methodology terms.MetaMap had a text mining precision, recall, and false positive rate of 0.70, 0.77, and 11.51%, respectively. The most common SD term was breathing-related sleep disorder, whereas narcolepsy was the least common. Cluster analyses showed similar methodology clusters for each SD term, except narcolepsy. The logistic regression models showed an increasing prevalence of insomnia, parasomnia, and other sleep disorders but a decreasing prevalence of breathing-related sleep disorder during 2000-2013. Different SD terms were positively associated with different methodology terms regarding research design terms, measure terms, and analysis terms.Insomnia-, parasomnia-, and other sleep disorder-related articles showed an increasing publication trend, whereas those related to breathing-related sleep disorder showed a decreasing trend. Furthermore, experimental studies more commonly focused on hypersomnia and other SDs and less commonly on insomnia, breathing-related sleep disorder, narcolepsy, and parasomnia. Thus, text mining may facilitate the exploration of the publication trends in SDs and the associated methodologies.

  18. Text Mining of Journal Articles for Sleep Disorder Terminologies.

    Science.gov (United States)

    Lam, Calvin; Lai, Fu-Chih; Wang, Chia-Hui; Lai, Mei-Hsin; Hsu, Nanly; Chung, Min-Huey

    2016-01-01

    Research on publication trends in journal articles on sleep disorders (SDs) and the associated methodologies by using text mining has been limited. The present study involved text mining for terms to determine the publication trends in sleep-related journal articles published during 2000-2013 and to identify associations between SD and methodology terms as well as conducting statistical analyses of the text mining findings. SD and methodology terms were extracted from 3,720 sleep-related journal articles in the PubMed database by using MetaMap. The extracted data set was analyzed using hierarchical cluster analyses and adjusted logistic regression models to investigate publication trends and associations between SD and methodology terms. MetaMap had a text mining precision, recall, and false positive rate of 0.70, 0.77, and 11.51%, respectively. The most common SD term was breathing-related sleep disorder, whereas narcolepsy was the least common. Cluster analyses showed similar methodology clusters for each SD term, except narcolepsy. The logistic regression models showed an increasing prevalence of insomnia, parasomnia, and other sleep disorders but a decreasing prevalence of breathing-related sleep disorder during 2000-2013. Different SD terms were positively associated with different methodology terms regarding research design terms, measure terms, and analysis terms. Insomnia-, parasomnia-, and other sleep disorder-related articles showed an increasing publication trend, whereas those related to breathing-related sleep disorder showed a decreasing trend. Furthermore, experimental studies more commonly focused on hypersomnia and other SDs and less commonly on insomnia, breathing-related sleep disorder, narcolepsy, and parasomnia. Thus, text mining may facilitate the exploration of the publication trends in SDs and the associated methodologies.

  19. Advances in Text Mining and Visualization for Precision Medicine.

    Science.gov (United States)

    Gonzalez-Hernandez, Graciela; Sarker, Abeed; O'Connor, Karen; Greene, Casey; Liu, Hongfang

    2018-01-01

    According to the National Institutes of Health (NIH), precision medicine is "an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person." Although the text mining community has explored this realm for some years, the official endorsement and funding launched in 2015 with the Precision Medicine Initiative are beginning to bear fruit. This session sought to elicit participation of researchers with strong background in text mining and/or visualization who are actively collaborating with bench scientists and clinicians for the deployment of integrative approaches in precision medicine that could impact scientific discovery and advance the vision of precision medicine as a universal, accessible approach at the point of care.

  20. Biomedical hypothesis generation by text mining and gene prioritization.

    Science.gov (United States)

    Petric, Ingrid; Ligeti, Balazs; Gyorffy, Balazs; Pongor, Sandor

    2014-01-01

    Text mining methods can facilitate the generation of biomedical hypotheses by suggesting novel associations between diseases and genes. Previously, we developed a rare-term model called RaJoLink (Petric et al, J. Biomed. Inform. 42(2): 219-227, 2009) in which hypotheses are formulated on the basis of terms rarely associated with a target domain. Since many current medical hypotheses are formulated in terms of molecular entities and molecular mechanisms, here we extend the methodology to proteins and genes, using a standardized vocabulary as well as a gene/protein network model. The proposed enhanced RaJoLink rare-term model combines text mining and gene prioritization approaches. Its utility is illustrated by finding known as well as potential gene-disease associations in ovarian cancer using MEDLINE abstracts and the STRING database.

  1. Text mining for traditional Chinese medical knowledge discovery: a survey.

    Science.gov (United States)

    Zhou, Xuezhong; Peng, Yonghong; Liu, Baoyan

    2010-08-01

    Extracting meaningful information and knowledge from free text is the subject of considerable research interest in the machine learning and data mining fields. Text data mining (or text mining) has become one of the most active research sub-fields in data mining. Significant developments in the area of biomedical text mining during the past years have demonstrated its great promise for supporting scientists in developing novel hypotheses and new knowledge from the biomedical literature. Traditional Chinese medicine (TCM) provides a distinct methodology with which to view human life. It is one of the most complete and distinguished traditional medicines with a history of several thousand years of studying and practicing the diagnosis and treatment of human disease. It has been shown that the TCM knowledge obtained from clinical practice has become a significant complementary source of information for modern biomedical sciences. TCM literature obtained from the historical period and from modern clinical studies has recently been transformed into digital data in the form of relational databases or text documents, which provide an effective platform for information sharing and retrieval. This motivates and facilitates research and development into knowledge discovery approaches and to modernize TCM. In order to contribute to this still growing field, this paper presents (1) a comparative introduction to TCM and modern biomedicine, (2) a survey of the related information sources of TCM, (3) a review and discussion of the state of the art and the development of text mining techniques with applications to TCM, (4) a discussion of the research issues around TCM text mining and its future directions. Copyright 2010 Elsevier Inc. All rights reserved.

  2. Imitating manual curation of text-mined facts in biomedicine.

    Directory of Open Access Journals (Sweden)

    Raul Rodriguez-Esteban

    2006-09-01

    Full Text Available Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted of individual facts--to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations, we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95. Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.

  3. Text mining a self-report back-translation.

    Science.gov (United States)

    Blanch, Angel; Aluja, Anton

    2016-06-01

    There are several recommendations about the routine to undertake when back translating self-report instruments in cross-cultural research. However, text mining methods have been generally ignored within this field. This work describes a text mining innovative application useful to adapt a personality questionnaire to 12 different languages. The method is divided in 3 different stages, a descriptive analysis of the available back-translated instrument versions, a dissimilarity assessment between the source language instrument and the 12 back-translations, and an item assessment of item meaning equivalence. The suggested method contributes to improve the back-translation process of self-report instruments for cross-cultural research in 2 significant intertwined ways. First, it defines a systematic approach to the back translation issue, allowing for a more orderly and informed evaluation concerning the equivalence of different versions of the same instrument in different languages. Second, it provides more accurate instrument back-translations, which has direct implications for the reliability and validity of the instrument's test scores when used in different cultures/languages. In addition, this procedure can be extended to the back-translation of self-reports measuring psychological constructs in clinical assessment. Future research works could refine the suggested methodology and use additional available text mining tools. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  4. A Text-Mining Framework for Supporting Systematic Reviews.

    Science.gov (United States)

    Li, Dingcheng; Wang, Zhen; Wang, Liwei; Sohn, Sunghwan; Shen, Feichen; Murad, Mohammad Hassan; Liu, Hongfang

    2016-11-01

    Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.

  5. Text Mining to Support Gene Ontology Curation and Vice Versa.

    Science.gov (United States)

    Ruch, Patrick

    2017-01-01

    In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.

  6. Text mining improves prediction of protein functional sites.

    Directory of Open Access Journals (Sweden)

    Karin M Verspoor

    Full Text Available We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites. The structure analysis was carried out using Dynamics Perturbation Analysis (DPA, which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions.

  7. Text Mining Improves Prediction of Protein Functional Sites

    Science.gov (United States)

    Cohn, Judith D.; Ravikumar, Komandur E.

    2012-01-01

    We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388

  8. The potential of text mining in data integration and network biology for plant research: a case study on Arabidopsis.

    Science.gov (United States)

    Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna J; Inzé, Dirk; Van de Peer, Yves

    2013-03-01

    Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies.

  9. Empirical advances with text mining of electronic health records.

    Science.gov (United States)

    Delespierre, T; Denormandie, P; Bar-Hen, A; Josseran, L

    2017-08-22

    Korian is a private group specializing in medical accommodations for elderly and dependent people. A professional data warehouse (DWH) established in 2010 hosts all of the residents' data. Inside this information system (IS), clinical narratives (CNs) were used only by medical staff as a residents' care linking tool. The objective of this study was to show that, through qualitative and quantitative textual analysis of a relatively small physiotherapy and well-defined CN sample, it was possible to build a physiotherapy corpus and, through this process, generate a new body of knowledge by adding relevant information to describe the residents' care and lives. Meaningful words were extracted through Standard Query Language (SQL) with the LIKE function and wildcards to perform pattern matching, followed by text mining and a word cloud using R® packages. Another step involved principal components and multiple correspondence analyses, plus clustering on the same residents' sample as well as on other health data using a health model measuring the residents' care level needs. By combining these techniques, physiotherapy treatments could be characterized by a list of constructed keywords, and the residents' health characteristics were built. Feeding defects or health outlier groups could be detected, physiotherapy residents' data and their health data were matched, and differences in health situations showed qualitative and quantitative differences in physiotherapy narratives. This textual experiment using a textual process in two stages showed that text mining and data mining techniques provide convenient tools to improve residents' health and quality of care by adding new, simple, useable data to the electronic health record (EHR). When used with a normalized physiotherapy problem list, text mining through information extraction (IE), named entity recognition (NER) and data mining (DM) can provide a real advantage to describe health care, adding new medical material and

  10. Spectral signature verification using statistical analysis and text mining

    Science.gov (United States)

    DeCoster, Mallory E.; Firpi, Alexe H.; Jacobs, Samantha K.; Cone, Shelli R.; Tzeng, Nigel H.; Rodriguez, Benjamin M.

    2016-05-01

    In the spectral science community, numerous spectral signatures are stored in databases representative of many sample materials collected from a variety of spectrometers and spectroscopists. Due to the variety and variability of the spectra that comprise many spectral databases, it is necessary to establish a metric for validating the quality of spectral signatures. This has been an area of great discussion and debate in the spectral science community. This paper discusses a method that independently validates two different aspects of a spectral signature to arrive at a final qualitative assessment; the textual meta-data and numerical spectral data. Results associated with the spectral data stored in the Signature Database1 (SigDB) are proposed. The numerical data comprising a sample material's spectrum is validated based on statistical properties derived from an ideal population set. The quality of the test spectrum is ranked based on a spectral angle mapper (SAM) comparison to the mean spectrum derived from the population set. Additionally, the contextual data of a test spectrum is qualitatively analyzed using lexical analysis text mining. This technique analyzes to understand the syntax of the meta-data to provide local learning patterns and trends within the spectral data, indicative of the test spectrum's quality. Text mining applications have successfully been implemented for security2 (text encryption/decryption), biomedical3 , and marketing4 applications. The text mining lexical analysis algorithm is trained on the meta-data patterns of a subset of high and low quality spectra, in order to have a model to apply to the entire SigDB data set. The statistical and textual methods combine to assess the quality of a test spectrum existing in a database without the need of an expert user. This method has been compared to other validation methods accepted by the spectral science community, and has provided promising results when a baseline spectral signature is

  11. An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.

    Science.gov (United States)

    Arighi, Cecilia N; Carterette, Ben; Cohen, K Bretonnel; Krallinger, Martin; Wilbur, W John; Fey, Petra; Dodson, Robert; Cooper, Laurel; Van Slyke, Ceri E; Dahdul, Wasila; Mabee, Paula; Li, Donghui; Harris, Bethany; Gillespie, Marc; Jimenez, Silvia; Roberts, Phoebe; Matthews, Lisa; Becker, Kevin; Drabkin, Harold; Bello, Susan; Licata, Luana; Chatr-aryamontri, Andrew; Schaeffer, Mary L; Park, Julie; Haendel, Melissa; Van Auken, Kimberly; Li, Yuling; Chan, Juancarlos; Muller, Hans-Michael; Cui, Hong; Balhoff, James P; Chi-Yang Wu, Johnny; Lu, Zhiyong; Wei, Chih-Hsuan; Tudor, Catalina O; Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar; Cejuela, Juan Miguel; Dubey, Pratibha; Wu, Cathy

    2013-01-01

    In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and

  12. Hot complaint intelligent classification based on text mining

    Directory of Open Access Journals (Sweden)

    XIA Haifeng

    2013-10-01

    Full Text Available The complaint recognizer system plays an important role in making sure the correct classification of the hot complaint,improving the service quantity of telecommunications industry.The customers’ complaint in telecommunications industry has its special particularity which should be done in limited time,which cause the error in classification of hot complaint.The paper presents a model of complaint hot intelligent classification based on text mining,which can classify the hot complaint in the correct level of the complaint navigation.The examples show that the model can be efficient to classify the text of the complaint.

  13. OSCAR4: a flexible architecture for chemical text-mining

    Directory of Open Access Journals (Sweden)

    Jessop David M

    2011-10-01

    Full Text Available Abstract The Open-Source Chemistry Analysis Routines (OSCAR software, a toolkit for the recognition of named entities and data in chemistry publications, has been developed since 2002. Recent work has resulted in the separation of the core OSCAR functionality and its release as the OSCAR4 library. This library features a modular API (based on reduction of surface coupling that permits client programmers to easily incorporate it into external applications. OSCAR4 offers a domain-independent architecture upon which chemistry specific text-mining tools can be built, and its development and usage are discussed.

  14. Building a glaucoma interaction network using a text mining approach.

    Science.gov (United States)

    Soliman, Maha; Nasraoui, Olfa; Cooper, Nigel G F

    2016-01-01

    The volume of biomedical literature and its underlying knowledge base is rapidly expanding, making it beyond the ability of a single human being to read through all the literature. Several automated methods have been developed to help make sense of this dilemma. The present study reports on the results of a text mining approach to extract gene interactions from the data warehouse of published experimental results which are then used to benchmark an interaction network associated with glaucoma. To the best of our knowledge, there is, as yet, no glaucoma interaction network derived solely from text mining approaches. The presence of such a network could provide a useful summative knowledge base to complement other forms of clinical information related to this disease. A glaucoma corpus was constructed from PubMed Central and a text mining approach was applied to extract genes and their relations from this corpus. The extracted relations between genes were checked using reference interaction databases and classified generally as known or new relations. The extracted genes and relations were then used to construct a glaucoma interaction network. Analysis of the resulting network indicated that it bears the characteristics of a small world interaction network. Our analysis showed the presence of seven glaucoma linked genes that defined the network modularity. A web-based system for browsing and visualizing the extracted glaucoma related interaction networks is made available at http://neurogene.spd.louisville.edu/GlaucomaINViewer/Form1.aspx. This study has reported the first version of a glaucoma interaction network using a text mining approach. The power of such an approach is in its ability to cover a wide range of glaucoma related studies published over many years. Hence, a bigger picture of the disease can be established. To the best of our knowledge, this is the first glaucoma interaction network to summarize the known literature. The major findings were a set of

  15. The Role of Text Mining in Export Control

    Energy Technology Data Exchange (ETDEWEB)

    Tae, Jae-woong; Son, Choul-woong; Shin, Dong-hoon [Korea Institute of Nuclear Nonproliferation and Control, Daejeon (Korea, Republic of)

    2015-10-15

    Korean government provides classification services to exporters. It is simple to copy technology such as documents and drawings. Moreover, it is also easy that new technology derived from the existing technology. The diversity of technology makes classification difficult because the boundary between strategic and nonstrategic technology is unclear and ambiguous. Reviewers should consider previous classification cases enough. However, the increase of the classification cases prevent consistent classifications. This made another innovative and effective approaches necessary. IXCRS (Intelligent Export Control Review System) is proposed to coincide with demands. IXCRS consists of and expert system, a semantic searching system, a full text retrieval system, and image retrieval system and a document retrieval system. It is the aim of the present paper to observe the document retrieval system based on text mining and to discuss how to utilize the system. This study has demonstrated how text mining technique can be applied to export control. The document retrieval system supports reviewers to treat previous classification cases effectively. Especially, it is highly probable that similarity data will contribute to specify classification criterion. However, an analysis of the system showed a number of problems that remain to be explored such as a multilanguage problem and an inclusion relationship problem. Further research should be directed to solve problems and to apply more data mining techniques so that the system should be used as one of useful tools for export control.

  16. The Role of Text Mining in Export Control

    International Nuclear Information System (INIS)

    Tae, Jae-woong; Son, Choul-woong; Shin, Dong-hoon

    2015-01-01

    Korean government provides classification services to exporters. It is simple to copy technology such as documents and drawings. Moreover, it is also easy that new technology derived from the existing technology. The diversity of technology makes classification difficult because the boundary between strategic and nonstrategic technology is unclear and ambiguous. Reviewers should consider previous classification cases enough. However, the increase of the classification cases prevent consistent classifications. This made another innovative and effective approaches necessary. IXCRS (Intelligent Export Control Review System) is proposed to coincide with demands. IXCRS consists of and expert system, a semantic searching system, a full text retrieval system, and image retrieval system and a document retrieval system. It is the aim of the present paper to observe the document retrieval system based on text mining and to discuss how to utilize the system. This study has demonstrated how text mining technique can be applied to export control. The document retrieval system supports reviewers to treat previous classification cases effectively. Especially, it is highly probable that similarity data will contribute to specify classification criterion. However, an analysis of the system showed a number of problems that remain to be explored such as a multilanguage problem and an inclusion relationship problem. Further research should be directed to solve problems and to apply more data mining techniques so that the system should be used as one of useful tools for export control

  17. Text-mining analysis of mHealth research

    Science.gov (United States)

    Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

    2017-01-01

    In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from “mobile phone” to “smartphone” and from “applications” to “apps”. Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical

  18. Text-mining analysis of mHealth research.

    Science.gov (United States)

    Ozaydin, Bunyamin; Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

    2017-01-01

    In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from "mobile phone" to "smartphone" and from "applications" to "apps". Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions

  19. Text Mining of UU-ITE Implementation in Indonesia

    Science.gov (United States)

    Hakim, Lukmanul; Kusumasari, Tien F.; Lubis, Muharman

    2018-04-01

    At present, social media and networks act as one of the main platforms for sharing information, idea, thought and opinions. Many people share their knowledge and express their views on the specific topics or current hot issues that interest them. The social media texts have rich information about the complaints, comments, recommendation and suggestion as the automatic reaction or respond to government initiative or policy in order to overcome certain issues.This study examines the sentiment from netizensas part of citizen who has vocal sound about the implementation of UU ITE as the first cyberlaw in Indonesia as a means to identify the current tendency of citizen perception. To perform text mining techniques, this study used Twitter Rest API while R programming was utilized for the purpose of classification analysis based on hierarchical cluster.

  20. Annotated chemical patent corpus: a gold standard for text mining.

    Directory of Open Access Journals (Sweden)

    Saber A Akhondi

    Full Text Available Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org.

  1. Information Retrieval and Text Mining Technologies for Chemistry.

    Science.gov (United States)

    Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália; Oyarzabal, Julen; Valencia, Alfonso

    2017-06-28

    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.

  2. Text mining applications in psychiatry: a systematic literature review.

    Science.gov (United States)

    Abbe, Adeline; Grouin, Cyril; Zweigenbaum, Pierre; Falissard, Bruno

    2016-06-01

    The expansion of biomedical literature is creating the need for efficient tools to keep pace with increasing volumes of information. Text mining (TM) approaches are becoming essential to facilitate the automated extraction of useful biomedical information from unstructured text. We reviewed the applications of TM in psychiatry, and explored its advantages and limitations. A systematic review of the literature was carried out using the CINAHL, Medline, EMBASE, PsycINFO and Cochrane databases. In this review, 1103 papers were screened, and 38 were included as applications of TM in psychiatric research. Using TM and content analysis, we identified four major areas of application: (1) Psychopathology (i.e. observational studies focusing on mental illnesses) (2) the Patient perspective (i.e. patients' thoughts and opinions), (3) Medical records (i.e. safety issues, quality of care and description of treatments), and (4) Medical literature (i.e. identification of new scientific information in the literature). The information sources were qualitative studies, Internet postings, medical records and biomedical literature. Our work demonstrates that TM can contribute to complex research tasks in psychiatry. We discuss the benefits, limits, and further applications of this tool in the future. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  3. Sentiment analysis of Arabic tweets using text mining techniques

    Science.gov (United States)

    Al-Horaibi, Lamia; Khan, Muhammad Badruddin

    2016-07-01

    Sentiment analysis has become a flourishing field of text mining and natural language processing. Sentiment analysis aims to determine whether the text is written to express positive, negative, or neutral emotions about a certain domain. Most sentiment analysis researchers focus on English texts, with very limited resources available for other complex languages, such as Arabic. In this study, the target was to develop an initial model that performs satisfactorily and measures Arabic Twitter sentiment by using machine learning approach, Naïve Bayes and Decision Tree for classification algorithms. The datasets used contains more than 2,000 Arabic tweets collected from Twitter. We performed several experiments to check the performance of the two algorithms classifiers using different combinations of text-processing functions. We found that available facilities for Arabic text processing need to be made from scratch or improved to develop accurate classifiers. The small functionalities developed by us in a Python language environment helped improve the results and proved that sentiment analysis in the Arabic domain needs lot of work on the lexicon side.

  4. Event-based text mining for biology and functional genomics

    Science.gov (United States)

    Thompson, Paul; Nawaz, Raheel; McNaught, John; Kell, Douglas B.

    2015-01-01

    The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research. PMID:24907365

  5. Text mining factor analysis (TFA) in green tea patent data

    Science.gov (United States)

    Rahmawati, Sela; Suprijadi, Jadi; Zulhanif

    2017-03-01

    Factor analysis has become one of the most widely used multivariate statistical procedures in applied research endeavors across a multitude of domains. There are two main types of analyses based on factor analysis: Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA). Both EFA and CFA aim to observed relationships among a group of indicators with a latent variable, but they differ fundamentally, a priori and restrictions made to the factor model. This method will be applied to patent data technology sector green tea to determine the development technology of green tea in the world. Patent analysis is useful in identifying the future technological trends in a specific field of technology. Database patent are obtained from agency European Patent Organization (EPO). In this paper, CFA model will be applied to the nominal data, which obtain from the presence absence matrix. While doing processing, analysis CFA for nominal data analysis was based on Tetrachoric matrix. Meanwhile, EFA model will be applied on a title from sector technology dominant. Title will be pre-processing first using text mining analysis.

  6. Construction accident narrative classification: An evaluation of text mining techniques.

    Science.gov (United States)

    Goh, Yang Miang; Ubeynarayana, C U

    2017-11-01

    Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Monitoring interaction and collective text production through text mining

    Directory of Open Access Journals (Sweden)

    Macedo, Alexandra Lorandi

    2014-04-01

    Full Text Available This article presents the Concepts Network tool, developed using text mining technology. The main objective of this tool is to extract and relate terms of greatest incidence from a text and exhibit the results in the form of a graph. The Network was implemented in the Collective Text Editor (CTE which is an online tool that allows the production of texts in synchronized or non-synchronized forms. This article describes the application of the Network both in texts produced collectively and texts produced in a forum. The purpose of the tool is to offer support to the teacher in managing the high volume of data generated in the process of interaction amongst students and in the construction of the text. Specifically, the aim is to facilitate the teacher’s job by allowing him/her to process data in a shorter time than is currently demanded. The results suggest that the Concepts Network can aid the teacher, as it provides indicators of the quality of the text produced. Moreover, messages posted in forums can be analyzed without their content necessarily having to be pre-read.

  8. Biomedical text mining for research rigor and integrity: tasks, challenges, directions.

    Science.gov (United States)

    Kilicoglu, Halil

    2017-06-13

    An estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted because of problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the manifestation of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote responsible research practices, providing significant benefits for the biomedical research enterprise. Published by Oxford University Press 2017. This work is written by a US Government employee and is in the public domain in the US.

  9. Systematic analysis of molecular mechanisms for HCC metastasis via text mining approach.

    Science.gov (United States)

    Zhen, Cheng; Zhu, Caizhong; Chen, Haoyang; Xiong, Yiru; Tan, Junyuan; Chen, Dong; Li, Jin

    2017-02-21

    To systematically explore the molecular mechanism for hepatocellular carcinoma (HCC) metastasis and identify regulatory genes with text mining methods. Genes with highest frequencies and significant pathways related to HCC metastasis were listed. A handful of proteins such as EGFR, MDM2, TP53 and APP, were identified as hub nodes in PPI (protein-protein interaction) network. Compared with unique genes for HBV-HCCs, genes particular to HCV-HCCs were less, but may participate in more extensive signaling processes. VEGFA, PI3KCA, MAPK1, MMP9 and other genes may play important roles in multiple phenotypes of metastasis. Genes in abstracts of HCC-metastasis literatures were identified. Word frequency analysis, KEGG pathway and PPI network analysis were performed. Then co-occurrence analysis between genes and metastasis-related phenotypes were carried out. Text mining is effective for revealing potential regulators or pathways, but the purpose of it should be specific, and the combination of various methods will be more useful.

  10. Public reactions to e-cigarette regulations on Twitter: a text mining analysis.

    Science.gov (United States)

    Lazard, Allison J; Wilcox, Gary B; Tuttle, Hannah M; Glowacki, Elizabeth M; Pikowski, Jessica

    2017-12-01

    In May 2016, the Food and Drug Administration (FDA) issued a final rule that deemed e-cigarettes to be within their regulatory authority as a tobacco product. News and opinions about the regulation were shared on social media platforms, such as Twitter, which can play an important role in shaping the public's attitudes. We analysed information shared on Twitter for insights into initial public reactions. A text mining approach was used to uncover important topics among reactions to the e-cigarette regulations on Twitter. SAS Text Miner V.12.1 software was used for descriptive text mining to uncover the primary topics from tweets collected from May 1 to May 17 2016 using NUVI software to gather the data. A total of nine topics were generated. These topics reveal initial reactions to whether the FDA's e-cigarette regulations will benefit or harm public health, how the regulations will impact the emerging e-cigarette market and efforts to share the news. The topics were dominated by negative or mixed reactions. In the days following the FDA's announcement of the new deeming regulations, the public reaction on Twitter was largely negative. Public health advocates should consider using social media outlets to better communicate the policy's intentions, reach and potential impact for public good to create a more balanced conversation. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  11. Seqenv: linking sequences to environments through text mining.

    Science.gov (United States)

    Sinclair, Lucas; Ijaz, Umer Z; Jensen, Lars Juhl; Coolen, Marco J L; Gubry-Rangin, Cecile; Chroňáková, Alica; Oulas, Anastasis; Pavloudi, Christina; Schnetzer, Julia; Weimann, Aaron; Ijaz, Ali; Eiler, Alexander; Quince, Christopher; Pafilis, Evangelos

    2016-01-01

    Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the "nt" nucleotide database provided by NCBI and, out of every hit, extracts-if it is available-the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography. To install seqenv, go to: https://github.com/xapple/seqenv.

  12. Seqenv: linking sequences to environments through text mining

    Directory of Open Access Journals (Sweden)

    Lucas Sinclair

    2016-12-01

    Full Text Available Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the “nt” nucleotide database provided by NCBI and, out of every hit, extracts–if it is available–the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography. To install seqenv, go to: https://github.com/xapple/seqenv.

  13. Signal Detection Framework Using Semantic Text Mining Techniques

    Science.gov (United States)

    Sudarsan, Sithu D.

    2009-01-01

    Signal detection is a challenging task for regulatory and intelligence agencies. Subject matter experts in those agencies analyze documents, generally containing narrative text in a time bound manner for signals by identification, evaluation and confirmation, leading to follow-up action e.g., recalling a defective product or public advisory for…

  14. The Potential of Text Mining in Data Integration and Network Biology for Plant Research: A Case Study on Arabidopsis[C][W

    Science.gov (United States)

    Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna J.; Inzé, Dirk; Van de Peer, Yves

    2013-01-01

    Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein–protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies. PMID:23532071

  15. DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

    Science.gov (United States)

    Mahmood, A S M Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K

    2016-01-01

    The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.

  16. Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text.

    Science.gov (United States)

    Garten, Yael; Altman, Russ B

    2009-02-05

    Pharmacogenomics studies the relationship between genetic variation and the variation in drug response phenotypes. The field is rapidly gaining importance: it promises drugs targeted to particular subpopulations based on genetic background. The pharmacogenomics literature has expanded rapidly, but is dispersed in many journals. It is challenging, therefore, to identify important associations between drugs and molecular entities--particularly genes and gene variants, and thus these critical connections are often lost. Text mining techniques can allow us to convert the free-style text to a computable, searchable format in which pharmacogenomic concepts (such as genes, drugs, polymorphisms, and diseases) are identified, and important links between these concepts are recorded. Availability of full text articles as input into text mining engines is key, as literature abstracts often do not contain sufficient information to identify these pharmacogenomic associations. Thus, building on a tool called Textpresso, we have created the Pharmspresso tool to assist in identifying important pharmacogenomic facts in full text articles. Pharmspresso parses text to find references to human genes, polymorphisms, drugs and diseases and their relationships. It presents these as a series of marked-up text fragments, in which key concepts are visually highlighted. To evaluate Pharmspresso, we used a gold standard of 45 human-curated articles. Pharmspresso identified 78%, 61%, and 74% of target gene, polymorphism, and drug concepts, respectively. Pharmspresso is a text analysis tool that extracts pharmacogenomic concepts from the literature automatically and thus captures our current understanding of gene-drug interactions in a computable form. We have made Pharmspresso available at http://pharmspresso.stanford.edu.

  17. DDMGD: the database of text-mined associations between genes methylated in diseases from different species

    KAUST Repository

    Raies, A. B.; Mansour, H.; Incitti, R.; Bajic, Vladimir B.

    2014-01-01

    ://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD's scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we

  18. Negation scope and spelling variation for text-mining of Danish electronic patient records

    DEFF Research Database (Denmark)

    Thomas, Cecilia Engel; Jensen, Peter Bjødstrup; Werge, Thomas

    2014-01-01

    Electronic patient records are a potentially rich data source for knowledge extraction in biomedical research. Here we present a method based on the ICD10 system for text-mining of Danish health records. We have evaluated how adding functionalities to a baseline text-mining tool affected...

  19. Text mining for adverse drug events: the promise, challenges, and state of the art.

    Science.gov (United States)

    Harpaz, Rave; Callahan, Alison; Tamang, Suzanne; Low, Yen; Odgers, David; Finlayson, Sam; Jung, Kenneth; LePendu, Paea; Shah, Nigam H

    2014-10-01

    Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.

  20. Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.

    Science.gov (United States)

    Singhal, Ayush; Leaman, Robert; Catlett, Natalie; Lemberger, Thomas; McEntyre, Johanna; Polson, Shawn; Xenarios, Ioannis; Arighi, Cecilia; Lu, Zhiyong

    2016-01-01

    Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system 'accuracy' remains a challenge and identify several additional common difficulties and potential research directions including (i) the 'scalability' issue due to the increasing need of mining information from millions of full-text articles, (ii) the 'interoperability' issue of integrating various text-mining systems into existing curation workflows and (iii) the 'reusability' issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

  1. MET network in PubMed: a text-mined network visualization and curation system.

    Science.gov (United States)

    Dai, Hong-Jie; Su, Chu-Hsien; Lai, Po-Ting; Huang, Ming-Siang; Jonnagaddala, Jitendra; Rose Jue, Toni; Rao, Shruti; Chou, Hui-Jou; Milacic, Marija; Singh, Onkar; Syed-Abdul, Shabbir; Hsu, Wen-Lian

    2016-01-01

    Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway. © The Author(s) 2016. Published by Oxford University Press.

  2. Classifying unstructed textual data using the Product Score Model: an alternative text mining algorithm

    NARCIS (Netherlands)

    He, Qiwei; Veldkamp, Bernard P.; Eggen, T.J.H.M.; Veldkamp, B.P.

    2012-01-01

    Unstructured textual data such as students’ essays and life narratives can provide helpful information in educational and psychological measurement, but often contain irregularities and ambiguities, which creates difficulties in analysis. Text mining techniques that seek to extract useful

  3. ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.

    Science.gov (United States)

    Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping

    2018-04-27

    A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.

  4. Using text-mining techniques in electronic patient records to identify ADRs from medicine use

    DEFF Research Database (Denmark)

    Warrer, Pernille; Hansen, Ebba Holme; Jensen, Lars Juhl

    2012-01-01

    This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We...... included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs......, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text...

  5. pubmed.mineR: An R package with text-mining algorithms to ...

    Indian Academy of Sciences (India)

    2015-09-29

    Sep 29, 2015 ... using text-mining algorithms for biomedical research pur- poses. ... studies are described to illustrate some potential uses of ... This is the most applied task. ... other alphabets (for example, Greek alphabets) and hyphens.

  6. Using text mining for study identification in systematic reviews: a systematic review of current approaches

    OpenAIRE

    O?Mara-Eves, Alison; Thomas, James; McNaught, John; Miwa, Makoto; Ananiadou, Sophia

    2015-01-01

    Background The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic...

  7. A Survey of Text Mining in Social Media: Facebook and Twitter Perspectives

    Directory of Open Access Journals (Sweden)

    Said A. Salloum

    2017-01-01

    Full Text Available Text mining has become one of the trendy fields that has been incorporated in several research fields such as computational linguistics, Information Retrieval (IR and data mining. Natural Language Processing (NLP techniques were used to extract knowledge from the textual text that is written by human beings. Text mining reads an unstructured form of data to provide meaningful information patterns in a shortest time period. Social networking sites are a great source of communication as most of the people in today’s world use these sites in their daily lives to keep connected to each other. It becomes a common practice to not write a sentence with correct grammar and spelling. This practice may lead to different kinds of ambiguities like lexical, syntactic, and semantic and due to this type of unclear data, it is hard to find out the actual data order. Accordingly, we are conducting an investigation with the aim of looking for different text mining methods to get various textual orders on social media websites. This survey aims to describe how studies in social media have used text analytics and text mining techniques for the purpose of identifying the key themes in the data. This survey focused on analyzing the text mining studies related to Facebook and Twitter; the two dominant social media in the world. Results of this survey can serve as the baselines for future text mining research.

  8. Using text-mining techniques in electronic patient records to identify ADRs from medicine use.

    Science.gov (United States)

    Warrer, Pernille; Hansen, Ebba Holme; Juhl-Jensen, Lars; Aagaard, Lise

    2012-05-01

    This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs. © 2011 The Authors. British Journal of Clinical Pharmacology © 2011 The British Pharmacological Society.

  9. Automated detection of follow-up appointments using text mining of discharge records.

    Science.gov (United States)

    Ruud, Kari L; Johnson, Matthew G; Liesinger, Juliette T; Grafft, Carrie A; Naessens, James M

    2010-06-01

    To determine whether text mining can accurately detect specific follow-up appointment criteria in free-text hospital discharge records. Cross-sectional study. Mayo Clinic Rochester hospitals. Inpatients discharged from general medicine services in 2006 (n = 6481). Textual hospital dismissal summaries were manually reviewed to determine whether the records contained specific follow-up appointment arrangement elements: date, time and either physician or location for an appointment. The data set was evaluated for the same criteria using SAS Text Miner software. The two assessments were compared to determine the accuracy of text mining for detecting records containing follow-up appointment arrangements. Agreement of text-mined appointment findings with gold standard (manual abstraction) including sensitivity, specificity, positive predictive and negative predictive values (PPV and NPV). About 55.2% (3576) of discharge records contained all criteria for follow-up appointment arrangements according to the manual review, 3.2% (113) of which were missed through text mining. Text mining incorrectly identified 3.7% (107) follow-up appointments that were not considered valid through manual review. Therefore, the text mining analysis concurred with the manual review in 96.6% of the appointment findings. Overall sensitivity and specificity were 96.8 and 96.3%, respectively; and PPV and NPV were 97.0 and 96.1%, respectively. of individual appointment criteria resulted in accuracy rates of 93.5% for date, 97.4% for time, 97.5% for physician and 82.9% for location. Text mining of unstructured hospital dismissal summaries can accurately detect documentation of follow-up appointment arrangement elements, thus saving considerable resources for performance assessment and quality-related research.

  10. Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies.

    Science.gov (United States)

    Cohen, Raphael; Elhadad, Michael; Elhadad, Noémie

    2013-01-16

    The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicians typically copy and paste information from previous notes when documenting a current patient encounter. Thus, within a longitudinal patient record, one expects to observe heavy redundancy. In this paper, we ask three research questions: (i) How can redundancy be quantified in large-scale text corpora? (ii) Conventional wisdom is that larger corpora yield better results in text mining. But how does the observed EHR redundancy affect text mining? Does such redundancy introduce a bias that distorts learned models? Or does the redundancy introduce benefits by highlighting stable and important subsets of the corpus? (iii) How can one mitigate the impact of redundancy on text mining? We analyze a large-scale EHR corpus and quantify redundancy both in terms of word and semantic concept repetition. We observe redundancy levels of about 30% and non-standard distribution of both words and concepts. We measure the impact of redundancy on two standard text-mining applications: collocation identification and topic modeling. We compare the results of these methods on synthetic data with controlled levels of redundancy and observe significant performance variation. Finally, we compare two mitigation strategies to avoid redundancy-induced bias: (i) a baseline strategy, keeping only the last note for each patient in the corpus; (ii) removing redundant notes with an efficient fingerprinting-based algorithm. (a)For text mining, preprocessing the EHR corpus with fingerprinting yields

  11. Aspects of Text Mining From Computational Semiotics to Systemic Functional Hypertexts

    Directory of Open Access Journals (Sweden)

    Alexander Mehler

    2001-05-01

    Full Text Available The significance of natural language texts as the prime information structure for the management and dissemination of knowledge in organisations is still increasing. Making relevant documents available depending on varying tasks in different contexts is of primary importance for any efficient task completion. Implementing this demand requires the content based processing of texts, which enables to reconstruct or, if necessary, to explore the relationship of task, context and document. Text mining is a technology that is suitable for solving problems of this kind. In the following, semiotic aspects of text mining are investigated. Based on the primary object of text mining - natural language lexis - the specific complexity of this class of signs is outlined and requirements for the implementation of text mining procedures are derived. This is done with reference to text linkage introduced as a special task in text mining. Text linkage refers to the exploration of implicit, content based relations of texts (and their annotation as typed links in corpora possibly organised as hypertexts. In this context, the term systemic functional hypertext is introduced, which distinguishes genre and register layers for the management of links in a poly-level hypertext system.

  12. Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.

    Science.gov (United States)

    Lu, Zhiyong; Hirschman, Lynette

    2012-01-01

    Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. DATABASE URL: http://www.biocreative.org/tasks/bc-workshop-2012/workflow/.

  13. Text mining and visualization case studies using open-source tools

    CERN Document Server

    Chisholm, Andrew

    2016-01-01

    Text Mining and Visualization: Case Studies Using Open-Source Tools provides an introduction to text mining using some of the most popular and powerful open-source tools: KNIME, RapidMiner, Weka, R, and Python. The contributors-all highly experienced with text mining and open-source software-explain how text data are gathered and processed from a wide variety of sources, including books, server access logs, websites, social media sites, and message boards. Each chapter presents a case study that you can follow as part of a step-by-step, reproducible example. You can also easily apply and extend the techniques to other problems. All the examples are available on a supplementary website. The book shows you how to exploit your text data, offering successful application examples and blueprints for you to tackle your text mining tasks and benefit from open and freely available tools. It gets you up to date on the latest and most powerful tools, the data mining process, and specific text mining activities.

  14. BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs

    Directory of Open Access Journals (Sweden)

    Tsafnat Guy

    2011-04-01

    Full Text Available Abstract Background The identification of drug characteristics is a clinically important task, but it requires much expert knowledge and consumes substantial resources. We have developed a statistical text-mining approach (BInary Characteristics Extractor and biomedical Properties Predictor: BICEPP to help experts screen drugs that may have important clinical characteristics of interest. Results BICEPP first retrieves MEDLINE abstracts containing drug names, then selects tokens that best predict the list of drugs which represents the characteristic of interest. Machine learning is then used to classify drugs using a document frequency-based measure. Evaluation experiments were performed to validate BICEPP's performance on 484 characteristics of 857 drugs, identified from the Australian Medicines Handbook (AMH and the PharmacoKinetic Interaction Screening (PKIS database. Stratified cross-validations revealed that BICEPP was able to classify drugs into all 20 major therapeutic classes (100% and 157 (of 197 minor drug classes (80% with areas under the receiver operating characteristic curve (AUC > 0.80. Similarly, AUC > 0.80 could be obtained in the classification of 173 (of 238 adverse events (73%, up to 12 (of 15 groups of clinically significant cytochrome P450 enzyme (CYP inducers or inhibitors (80%, and up to 11 (of 14 groups of narrow therapeutic index drugs (79%. Interestingly, it was observed that the keywords used to describe a drug characteristic were not necessarily the most predictive ones for the classification task. Conclusions BICEPP has sufficient classification power to automatically distinguish a wide range of clinical properties of drugs. This may be used in pharmacovigilance applications to assist with rapid screening of large drug databases to identify important characteristics for further evaluation.

  15. Compatibility between Text Mining and Qualitative Research in the Perspectives of Grounded Theory, Content Analysis, and Reliability

    Science.gov (United States)

    Yu, Chong Ho; Jannasch-Pennell, Angel; DiGangi, Samuel

    2011-01-01

    The objective of this article is to illustrate that text mining and qualitative research are epistemologically compatible. First, like many qualitative research approaches, such as grounded theory, text mining encourages open-mindedness and discourages preconceptions. Contrary to the popular belief that text mining is a linear and fully automated…

  16. BioCreative Workshops for DOE Genome Sciences: Text Mining for Metagenomics

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Cathy H. [Univ. of Delaware, Newark, DE (United States). Center for Bioinformatics and Computational Biology; Hirschman, Lynette [The MITRE Corporation, Bedford, MA (United States)

    2016-10-29

    The objective of this project was to host BioCreative workshops to define and develop text mining tasks to meet the needs of the Genome Sciences community, focusing on metadata information extraction in metagenomics. Following the successful introduction of metagenomics at the BioCreative IV workshop, members of the metagenomics community and BioCreative communities continued discussion to identify candidate topics for a BioCreative metagenomics track for BioCreative V. Of particular interest was the capture of environmental and isolation source information from text. The outcome was to form a “community of interest” around work on the interactive EXTRACT system, which supported interactive tagging of environmental and species data. This experiment is included in the BioCreative V virtual issue of Database. In addition, there was broad participation by members of the metagenomics community in the panels held at BioCreative V, leading to valuable exchanges between the text mining developers and members of the metagenomics research community. These exchanges are reflected in a number of the overview and perspective pieces also being captured in the BioCreative V virtual issue. Overall, this conversation has exposed the metagenomics researchers to the possibilities of text mining, and educated the text mining developers to the specific needs of the metagenomics community.

  17. Evolution of bayesian-related research over time: a temporal text mining task

    CSIR Research Space (South Africa)

    de Waal, A

    2006-06-01

    Full Text Available Ronald Reagan’s Radio Addresses? Bayesian Analysis 2006, Volume 1, Number 2, pp. 189-383. 2. Mei Q and Zhai C, 2005. Discovering Evolutionary Theme Patterns from Text – An Exploration of Temporal Text Mining. KDD’05, August 21-24, 2005. Chicago...

  18. DISEASES: text mining and data integration of disease-gene associations.

    Science.gov (United States)

    Pletscher-Frankild, Sune; Pallejà, Albert; Tsafou, Kalliopi; Binder, Janos X; Jensen, Lars Juhl

    2015-03-01

    Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  19. The Determination of Children's Knowledge of Global Lunar Patterns from Online Essays Using Text Mining Analysis

    Science.gov (United States)

    Cheon, Jongpil; Lee, Sangno; Smith, Walter; Song, Jaeki; Kim, Yongjin

    2013-01-01

    The purpose of this study was to use text mining analysis of early adolescents' online essays to determine their knowledge of global lunar patterns. Australian and American students in grades five to seven wrote about global lunar patterns they had discovered by sharing observations with each other via the Internet. These essays were analyzed for…

  20. Using Text Mining to Uncover Students' Technology-Related Problems in Live Video Streaming

    Science.gov (United States)

    Abdous, M'hammed; He, Wu

    2011-01-01

    Because of their capacity to sift through large amounts of data, text mining and data mining are enabling higher education institutions to reveal valuable patterns in students' learning behaviours without having to resort to traditional survey methods. In an effort to uncover live video streaming (LVS) students' technology related-problems and to…

  1. Trends of E-Learning Research from 2000 to 2008: Use of Text Mining and Bibliometrics

    Science.gov (United States)

    Hung, Jui-long

    2012-01-01

    This study investigated the longitudinal trends of e-learning research using text mining techniques. Six hundred and eighty-nine (689) refereed journal articles and proceedings were retrieved from the Science Citation Index/Social Science Citation Index database in the period from 2000 to 2008. All e-learning publications were grouped into two…

  2. Citation Mining: Integrating Text Mining and Bibliometrics for Research User Profiling.

    Science.gov (United States)

    Kostoff, Ronald N.; del Rio, J. Antonio; Humenik, James A.; Garcia, Esther Ofilia; Ramirez, Ana Maria

    2001-01-01

    Discusses the importance of identifying the users and impact of research, and describes an approach for identifying the pathways through which research can impact other research, technology development, and applications. Describes a study that used citation mining, an integration of citation bibliometrics and text mining, on articles from the…

  3. Analysis of Nature of Science Included in Recent Popular Writing Using Text Mining Techniques

    Science.gov (United States)

    Jiang, Feng; McComas, William F.

    2014-01-01

    This study examined the inclusion of nature of science (NOS) in popular science writing to determine whether it could serve supplementary resource for teaching NOS and to evaluate the accuracy of text mining and classification as a viable research tool in science education research. Four groups of documents published from 2001 to 2010 were…

  4. Complementing the Numbers: A Text Mining Analysis of College Course Withdrawals

    Science.gov (United States)

    Michalski, Greg V.

    2011-01-01

    Excessive college course withdrawals are costly to the student and the institution in terms of time to degree completion, available classroom space, and other resources. Although generally well quantified, detailed analysis of the reasons given by students for course withdrawal is less common. To address this, a text mining analysis was performed…

  5. Integrated Text Mining and Chemoinformatics Analysis Associates Diet to Health Benefit at Molecular Level

    DEFF Research Database (Denmark)

    Jensen, Kasper; Panagiotou, Gianni; Kouskoumvekaki, Irene

    2014-01-01

    , lipids and nutrients. In this work, we applied text mining and Naïve Bayes classification to assemble the knowledge space of food-phytochemical and food-disease associations, where we distinguish between disease prevention/amelioration and disease progression. We subsequently searched for frequently...

  6. An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature.

    Science.gov (United States)

    Trybula, Walter J.; Wyllys, Ronald E.

    2000-01-01

    Addresses an approach to the discovery of scientific knowledge through an examination of data mining and text mining techniques. Presents the results of experiments that investigated knowledge acquisition from a selected set of technical documents by domain experts. (Contains 15 references.) (Author/LRW)

  7. PubstractHelper: A Web-based Text-Mining Tool for Marking Sentences in Abstracts from PubMed Using Multiple User-Defined Keywords.

    Science.gov (United States)

    Chen, Chou-Cheng; Ho, Chung-Liang

    2014-01-01

    While a huge amount of information about biological literature can be obtained by searching the PubMed database, reading through all the titles and abstracts resulting from such a search for useful information is inefficient. Text mining makes it possible to increase this efficiency. Some websites use text mining to gather information from the PubMed database; however, they are database-oriented, using pre-defined search keywords while lacking a query interface for user-defined search inputs. We present the PubMed Abstract Reading Helper (PubstractHelper) website which combines text mining and reading assistance for an efficient PubMed search. PubstractHelper can accept a maximum of ten groups of keywords, within each group containing up to ten keywords. The principle behind the text-mining function of PubstractHelper is that keywords contained in the same sentence are likely to be related. PubstractHelper highlights sentences with co-occurring keywords in different colors. The user can download the PMID and the abstracts with color markings to be reviewed later. The PubstractHelper website can help users to identify relevant publications based on the presence of related keywords, which should be a handy tool for their research. http://bio.yungyun.com.tw/ATM/PubstractHelper.aspx and http://holab.med.ncku.edu.tw/ATM/PubstractHelper.aspx.

  8. Vaccine adverse event text mining system for extracting features from vaccine safety reports.

    Science.gov (United States)

    Botsis, Taxiarchis; Buttolph, Thomas; Nguyen, Michael D; Winiecki, Scott; Woo, Emily Jane; Ball, Robert

    2012-01-01

    To develop and evaluate a text mining system for extracting key clinical features from vaccine adverse event reporting system (VAERS) narratives to aid in the automated review of adverse event reports. Based upon clinical significance to VAERS reviewing physicians, we defined the primary (diagnosis and cause of death) and secondary features (eg, symptoms) for extraction. We built a novel vaccine adverse event text mining (VaeTM) system based on a semantic text mining strategy. The performance of VaeTM was evaluated using a total of 300 VAERS reports in three sequential evaluations of 100 reports each. Moreover, we evaluated the VaeTM contribution to case classification; an information retrieval-based approach was used for the identification of anaphylaxis cases in a set of reports and was compared with two other methods: a dedicated text classifier and an online tool. The performance metrics of VaeTM were text mining metrics: recall, precision and F-measure. We also conducted a qualitative difference analysis and calculated sensitivity and specificity for classification of anaphylaxis cases based on the above three approaches. VaeTM performed best in extracting diagnosis, second level diagnosis, drug, vaccine, and lot number features (lenient F-measure in the third evaluation: 0.897, 0.817, 0.858, 0.874, and 0.914, respectively). In terms of case classification, high sensitivity was achieved (83.1%); this was equal and better compared to the text classifier (83.1%) and the online tool (40.7%), respectively. Our VaeTM implementation of a semantic text mining strategy shows promise in providing accurate and efficient extraction of key features from VAERS narratives.

  9. Text mining approach to predict hospital admissions using early medical records from the emergency department.

    Science.gov (United States)

    Lucini, Filipe R; S Fogliatto, Flavio; C da Silveira, Giovani J; L Neyeloff, Jeruza; Anzanello, Michel J; de S Kuchenbecker, Ricardo; D Schaan, Beatriz

    2017-04-01

    Emergency department (ED) overcrowding is a serious issue for hospitals. Early information on short-term inward bed demand from patients receiving care at the ED may reduce the overcrowding problem, and optimize the use of hospital resources. In this study, we use text mining methods to process data from early ED patient records using the SOAP framework, and predict future hospitalizations and discharges. We try different approaches for pre-processing of text records and to predict hospitalization. Sets-of-words are obtained via binary representation, term frequency, and term frequency-inverse document frequency. Unigrams, bigrams and trigrams are tested for feature formation. Feature selection is based on χ 2 and F-score metrics. In the prediction module, eight text mining methods are tested: Decision Tree, Random Forest, Extremely Randomized Tree, AdaBoost, Logistic Regression, Multinomial Naïve Bayes, Support Vector Machine (Kernel linear) and Nu-Support Vector Machine (Kernel linear). Prediction performance is evaluated by F1-scores. Precision and Recall values are also informed for all text mining methods tested. Nu-Support Vector Machine was the text mining method with the best overall performance. Its average F1-score in predicting hospitalization was 77.70%, with a standard deviation (SD) of 0.66%. The method could be used to manage daily routines in EDs such as capacity planning and resource allocation. Text mining could provide valuable information and facilitate decision-making by inward bed management teams. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.

  10. Knowledge based word-concept model estimation and refinement for biomedical text mining.

    Science.gov (United States)

    Jimeno Yepes, Antonio; Berlanga, Rafael

    2015-02-01

    Text mining of scientific literature has been essential for setting up large public biomedical databases, which are being widely used by the research community. In the biomedical domain, the existence of a large number of terminological resources and knowledge bases (KB) has enabled a myriad of machine learning methods for different text mining related tasks. Unfortunately, KBs have not been devised for text mining tasks but for human interpretation, thus performance of KB-based methods is usually lower when compared to supervised machine learning methods. The disadvantage of supervised methods though is they require labeled training data and therefore not useful for large scale biomedical text mining systems. KB-based methods do not have this limitation. In this paper, we describe a novel method to generate word-concept probabilities from a KB, which can serve as a basis for several text mining tasks. This method not only takes into account the underlying patterns within the descriptions contained in the KB but also those in texts available from large unlabeled corpora such as MEDLINE. The parameters of the model have been estimated without training data. Patterns from MEDLINE have been built using MetaMap for entity recognition and related using co-occurrences. The word-concept probabilities were evaluated on the task of word sense disambiguation (WSD). The results showed that our method obtained a higher degree of accuracy than other state-of-the-art approaches when evaluated on the MSH WSD data set. We also evaluated our method on the task of document ranking using MEDLINE citations. These results also showed an increase in performance over existing baseline retrieval approaches. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. SWIFT-Review: a text-mining workbench for systematic review.

    Science.gov (United States)

    Howard, Brian E; Phillips, Jason; Miller, Kyle; Tandon, Arpit; Mav, Deepak; Shah, Mihir R; Holmgren, Stephanie; Pelch, Katherine E; Walker, Vickie; Rooney, Andrew A; Macleod, Malcolm; Shah, Ruchir R; Thayer, Kristina

    2016-05-23

    effort ordinarily required when using un-ordered document lists. In addition, the tagging and annotation capabilities of SWIFT-Review can be useful during the activities of scoping and problem formulation. Text-mining and machine learning software such as SWIFT-Review can be valuable tools to reduce the human screening burden and assist in problem formulation.

  12. Using text mining for study identification in systematic reviews: a systematic review of current approaches.

    Science.gov (United States)

    O'Mara-Eves, Alison; Thomas, James; McNaught, John; Miwa, Makoto; Ananiadou, Sophia

    2015-01-14

    The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic review fills that research gap. Focusing mainly on non-technical issues, the review aims to increase awareness of the potential of these technologies and promote further collaborative research between the computer science and systematic review communities. Five research questions led our review: what is the state of the evidence base; how has workload reduction been evaluated; what are the purposes of semi-automation and how effective are they; how have key contextual problems of applying text mining to the systematic review field been addressed; and what challenges to implementation have emerged? We answered these questions using standard systematic review methods: systematic and exhaustive searching, quality-assured data extraction and a narrative synthesis to synthesise findings. The evidence base is active and diverse; there is almost no replication between studies or collaboration between research teams and, whilst it is difficult to establish any overall conclusions about best approaches, it is clear that efficiencies and reductions in workload are potentially achievable. On the whole, most suggested that a saving in workload of between 30% and 70% might be possible, though sometimes the saving in workload is accompanied by the loss of 5% of relevant studies (i.e. a 95% recall). Using text mining to prioritise the order in which items are screened should be considered safe and ready for use in 'live' reviews. The use of text mining as a 'second screener' may also be used cautiously

  13. The WONP-NURT corpus as nuclear knowledge base for text mining in the INIS database

    International Nuclear Information System (INIS)

    Guerra Valdes, R.

    2011-01-01

    In the present work the WONP-NURT corpus is taken as knowledge base for text mining in the INIS database. Main components of the information processing system, as well as computational methods for content analysis of INIS database record files are described. Results of the content analysis of the WONP-NURT corpus are reported. Furthermore, results of two comparative text mining studies in the INIS database are also shown. The first one explores 10 research areas in the more familiar nearest range of WONP-NURT corpus, while the second one surveys 15 regions in the more exotic far range. The results provide new elements to asses the significance of the WONP-NURT corpus in the context of the current state of nuclear science and technology research areas. (Author)

  14. Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial

    DEFF Research Database (Denmark)

    Debortoli, Stefan; Müller, Oliver; Junglas, Iris

    2016-01-01

    , such as manual coding. Yet, the size of text data setsobtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challengesencountered when applying automated text-mining techniques in information systems research. In particular, weshowcase the use of probabilistic...... researchers,this tutorial provides some guidance for conducting text mining studies on their own and for evaluating the quality ofothers.......t is estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video);and much of it is expressed in rich and ambiguous natural language. Traditionally, the analysis of natural languagehas prompted the use of qualitative data analysis approaches...

  15. From university research to innovation Detecting knowledge transfer via text mining

    DEFF Research Database (Denmark)

    Woltmann, Sabrina; Clemmensen, Line Katrine Harder; Alkærsig, Lars

    2016-01-01

    and indicators such as patents, collaborative publications and license agreements, to assess the contribution to the socioeconomic surrounding of universities. In this study, we present an extension of the current empirical framework by applying new computational methods, namely text mining and pattern...... associated the former with the latter to obtain insights into possible text and semantic relatedness. The text mining methods are extrapolating the correlations, semantic patterns and content comparison of the two corpora to define the document relatedness. We expect the development of a novel tool using...... recognition. Text samples for this purpose can include files containing social media contents, company websites and annual reports. The empirical focus in the present study is on the technical sciences and in particular on the case of the Technical University of Denmark (DTU). We generated two independent...

  16. Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.

    Science.gov (United States)

    Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R

    2015-12-01

    This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. Copyright © 2015 Elsevier Inc. All rights reserved.

  17. Towards A Model Of Knowledge Extraction Of Text Mining For Palliative Care Patients In Panama.

    Directory of Open Access Journals (Sweden)

    Denis Cedeno Moreno

    2015-08-01

    Full Text Available Solutions using information technology is an innovative way to manage the information hospice patients in hospitals in Panama. The application of techniques of text mining for the domain of medicine especially information from electronic health records of patients in palliative care is one of the most recent and promising research areas for the analysis of textual data. Text mining is based on new knowledge extraction from unstructured natural language data. We may also create ontologies to describe the terminology and knowledge in a given domain. In an ontology conceptualization of a domain that may be general or specific formalized. Knowledge can be used for decision making by health specialists or can help in research topics for improving the health system.

  18. Text Mining Untuk Analisis Sentimen Review Film Menggunakan Algoritma K-Means

    Directory of Open Access Journals (Sweden)

    Setyo Budi

    2017-02-01

    Full Text Available Kemudahan manusia didalam menggunakan website mengakibatkan bertambahnya dokumen teks yang berupa pendapat dan informasi. Dalam waktu yang lama dokumen teks akan bertambah besar. Text mining merupakan salah satu teknik yang digunakan untuk menggali kumpulan dokumen text sehingga dapat diambil intisarinya. Ada beberapa algoritma yang di gunakan untuk penggalian dokumen untuk analisis sentimen, salah satunya adalah K-Means. Didalam penelitian ini algoritma yang digunakan adalah K-Means. Hasil penelitian menunjukkan bahwa akurasi K-Means dengan dataset digunakan 300 positif dan 300 negatif  akurasinya 57.83%,  700 dokumen positif dan 700  negatif akurasinya 56.71%%, 1000 dokumen positif dan 1000  negatif akurasinya 50.40%%. Dari hasil pengujian disimpulkan bahwa semakin besar dataset yang digunakan semakin rendah akurasi K-Means.   Kata Kunci : Text Mining, Analisis Sentimen, K-Means, Review Film 

  19. Harnessing the Power of Text Mining for the Detection of Abusive Content in Social Media

    OpenAIRE

    Chen, Hao; McKeever, Susan; Delany, Sarah Jane

    2016-01-01

    Abstract The issues of cyberbullying and online harassment have gained considerable coverage in the last number of years. Social media providers need to be able to detect abusive content both accurately and efficiently in order to protect their users. Our aim is to investigate the application of core text mining techniques for the automatic detection of abusive content across a range of social media sources include blogs, forums, media-sharing, Q&A and chat - using datasets from Twitter, YouT...

  20. Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows.

    Science.gov (United States)

    Fu, Xiao; Batista-Navarro, Riza; Rak, Rafal; Ananiadou, Sophia

    2015-01-01

    Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.

  1. Application of Ferulic Acid for Alzheimer's Disease: Combination of Text Mining and Experimental Validation.

    Science.gov (United States)

    Meng, Guilin; Meng, Xiulin; Ma, Xiaoye; Zhang, Gengping; Hu, Xiaolin; Jin, Aiping; Zhao, Yanxin; Liu, Xueyuan

    2018-01-01

    Alzheimer's disease (AD) is an increasing concern in human health. Despite significant research, highly effective drugs to treat AD are lacking. The present study describes the text mining process to identify drug candidates from a traditional Chinese medicine (TCM) database, along with associated protein target mechanisms. We carried out text mining to identify literatures that referenced both AD and TCM and focused on identifying compounds and protein targets of interest. After targeting one potential TCM candidate, corresponding protein-protein interaction (PPI) networks were assembled in STRING to decipher the most possible mechanism of action. This was followed by validation using Western blot and co-immunoprecipitation in an AD cell model. The text mining strategy using a vast amount of AD-related literature and the TCM database identified curcumin, whose major component was ferulic acid (FA). This was used as a key candidate compound for further study. Using the top calculated interaction score in STRING, BACE1 and MMP2 were implicated in the activity of FA in AD. Exposure of SHSY5Y-APP cells to FA resulted in the decrease in expression levels of BACE-1 and APP, while the expression of MMP-2 and MMP-9 increased in a dose-dependent manner. This suggests that FA induced BACE1 and MMP2 pathways maybe novel potential mechanisms involved in AD. The text mining of literature and TCM database related to AD suggested FA as a promising TCM ingredient for the treatment of AD. Potential mechanisms interconnected and integrated with Aβ aggregation inhibition and extracellular matrix remodeling underlying the activity of FA were identified using in vitro studies.

  2. Text Mining Untuk Analisis Sentimen Review Film Menggunakan Algoritma K-Means

    OpenAIRE

    Setyo Budi

    2017-01-01

    Kemudahan manusia didalam menggunakan website mengakibatkan bertambahnya dokumen teks yang berupa pendapat dan informasi. Dalam waktu yang lama dokumen teks akan bertambah besar. Text mining merupakan salah satu teknik yang digunakan untuk menggali kumpulan dokumen text sehingga dapat diambil intisarinya. Ada beberapa algoritma yang di gunakan untuk penggalian dokumen untuk analisis sentimen, salah satunya adalah K-Means. Didalam penelitian ini algoritma yang digunakan adalah K-Means. Hasil p...

  3. A tm Plug-In for Distributed Text Mining in R

    Directory of Open Access Journals (Sweden)

    Stefan Theussl

    2012-11-01

    Full Text Available R has gained explicit text mining support with the tm package enabling statisticians to answer many interesting research questions via statistical analysis or modeling of (text corpora. However, we typically face two challenges when analyzing large corpora: (1 the amount of data to be processed in a single machine is usually limited by the available main memory (i.e., RAM, and (2 the more data to be analyzed the higher the need for efficient procedures for calculating valuable results. Fortunately, adequate programming models like MapReduce facilitate parallelization of text mining tasks and allow for processing data sets beyond what would fit into memory by using a distributed file system possibly spanning over several machines, e.g., in a cluster of workstations. In this paper we present a plug-in package to tm called tm.plugin.dc implementing a distributed corpus class which can take advantage of the Hadoop MapReduce library for large scale text mining tasks. We show on the basis of an application in culturomics that we can efficiently handle data sets of significant size.

  4. Stopping Antidepressants and Anxiolytics as Major Concerns Reported in Online Health Communities: A Text Mining Approach.

    Science.gov (United States)

    Abbe, Adeline; Falissard, Bruno

    2017-10-23

    Internet is a particularly dynamic way to quickly capture the perceptions of a population in real time. Complementary to traditional face-to-face communication, online social networks help patients to improve self-esteem and self-help. The aim of this study was to use text mining on material from an online forum exploring patients' concerns about treatment (antidepressants and anxiolytics). Concerns about treatment were collected from discussion titles in patients' online community related to antidepressants and anxiolytics. To examine the content of these titles automatically, we used text mining methods, such as word frequency in a document-term matrix and co-occurrence of words using a network analysis. It was thus possible to identify topics discussed on the forum. The forum included 2415 discussions on antidepressants and anxiolytics over a period of 3 years. After a preprocessing step, the text mining algorithm identified the 99 most frequently occurring words in titles, among which were escitalopram, withdrawal, antidepressant, venlafaxine, paroxetine, and effect. Patients' concerns were related to antidepressant withdrawal, the need to share experience about symptoms, effects, and questions on weight gain with some drugs. Patients' expression on the Internet is a potential additional resource in addressing patients' concerns about treatment. Patient profiles are close to that of patients treated in psychiatry. ©Adeline Abbe, Bruno Falissard. Originally published in JMIR Mental Health (http://mental.jmir.org), 23.10.2017.

  5. A Framework for Text Mining in Scientometric Study: A Case Study in Biomedicine Publications

    Science.gov (United States)

    Silalahi, V. M. M.; Hardiyati, R.; Nadhiroh, I. M.; Handayani, T.; Rahmaida, R.; Amelia, M.

    2018-04-01

    The data of Indonesians research publications in the domain of biomedicine has been collected to be text mined for the purpose of a scientometric study. The goal is to build a predictive model that provides a classification of research publications on the potency for downstreaming. The model is based on the drug development processes adapted from the literatures. An effort is described to build the conceptual model and the development of a corpus on the research publications in the domain of Indonesian biomedicine. Then an investigation is conducted relating to the problems associated with building a corpus and validating the model. Based on our experience, a framework is proposed to manage the scientometric study based on text mining. Our method shows the effectiveness of conducting a scientometric study based on text mining in order to get a valid classification model. This valid model is mainly supported by the iterative and close interactions with the domain experts starting from identifying the issues, building a conceptual model, to the labelling, validation and results interpretation.

  6. Beyond accuracy: creating interoperable and scalable text-mining web services.

    Science.gov (United States)

    Wei, Chih-Hsuan; Leaman, Robert; Lu, Zhiyong

    2016-06-15

    The biomedical literature is a knowledge-rich resource and an important foundation for future research. With over 24 million articles in PubMed and an increasing growth rate, research in automated text processing is becoming increasingly important. We report here our recently developed web-based text mining services for biomedical concept recognition and normalization. Unlike most text-mining software tools, our web services integrate several state-of-the-art entity tagging systems (DNorm, GNormPlus, SR4GN, tmChem and tmVar) and offer a batch-processing mode able to process arbitrary text input (e.g. scholarly publications, patents and medical records) in multiple formats (e.g. BioC). We support multiple standards to make our service interoperable and allow simpler integration with other text-processing pipelines. To maximize scalability, we have preprocessed all PubMed articles, and use a computer cluster for processing large requests of arbitrary text. Our text-mining web service is freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl : Zhiyong.Lu@nih.gov. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

  7. Coronary artery disease risk assessment from unstructured electronic health records using text mining.

    Science.gov (United States)

    Jonnagaddala, Jitendra; Liaw, Siaw-Teng; Ray, Pradeep; Kumar, Manish; Chang, Nai-Wen; Dai, Hong-Jie

    2015-12-01

    Coronary artery disease (CAD) often leads to myocardial infarction, which may be fatal. Risk factors can be used to predict CAD, which may subsequently lead to prevention or early intervention. Patient data such as co-morbidities, medication history, social history and family history are required to determine the risk factors for a disease. However, risk factor data are usually embedded in unstructured clinical narratives if the data is not collected specifically for risk assessment purposes. Clinical text mining can be used to extract data related to risk factors from unstructured clinical notes. This study presents methods to extract Framingham risk factors from unstructured electronic health records using clinical text mining and to calculate 10-year coronary artery disease risk scores in a cohort of diabetic patients. We developed a rule-based system to extract risk factors: age, gender, total cholesterol, HDL-C, blood pressure, diabetes history and smoking history. The results showed that the output from the text mining system was reliable, but there was a significant amount of missing data to calculate the Framingham risk score. A systematic approach for understanding missing data was followed by implementation of imputation strategies. An analysis of the 10-year Framingham risk scores for coronary artery disease in this cohort has shown that the majority of the diabetic patients are at moderate risk of CAD. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. DrugQuest - a text mining workflow for drug association discovery.

    Science.gov (United States)

    Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Vizirianakis, Ioannis S; Iliopoulos, Ioannis

    2016-06-06

    Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .

  9. Experiences with Text Mining Large Collections of Unstructured Systems Development Artifacts at JPL

    Science.gov (United States)

    Port, Dan; Nikora, Allen; Hihn, Jairus; Huang, LiGuo

    2011-01-01

    Often repositories of systems engineering artifacts at NASA's Jet Propulsion Laboratory (JPL) are so large and poorly structured that they have outgrown our capability to effectively manually process their contents to extract useful information. Sophisticated text mining methods and tools seem a quick, low-effort approach to automating our limited manual efforts. Our experiences of exploring such methods mainly in three areas including historical risk analysis, defect identification based on requirements analysis, and over-time analysis of system anomalies at JPL, have shown that obtaining useful results requires substantial unanticipated efforts - from preprocessing the data to transforming the output for practical applications. We have not observed any quick 'wins' or realized benefit from short-term effort avoidance through automation in this area. Surprisingly we have realized a number of unexpected long-term benefits from the process of applying text mining to our repositories. This paper elaborates some of these benefits and our important lessons learned from the process of preparing and applying text mining to large unstructured system artifacts at JPL aiming to benefit future TM applications in similar problem domains and also in hope for being extended to broader areas of applications.

  10. Assimilating Text-Mining & Bio-Informatics Tools to Analyze Cellulase structures

    Science.gov (United States)

    Satyasree, K. P. N. V., Dr; Lalitha Kumari, B., Dr; Jyotsna Devi, K. S. N. V.; Choudri, S. M. Roy; Pratap Joshi, K.

    2017-08-01

    Text-mining is one of the best potential way of automatically extracting information from the huge biological literature. To exploit its prospective, the knowledge encrypted in the text should be converted to some semantic representation such as entities and relations, which could be analyzed by machines. But large-scale practical systems for this purpose are rare. But text mining could be helpful for generating or validating predictions. Cellulases have abundant applications in various industries. Cellulose degrading enzymes are cellulases and the same producing bacteria - Bacillus subtilis & fungus Pseudomonas putida were isolated from top soil of Guntur Dt. A.P. India. Absolute cultures were conserved on potato dextrose agar medium for molecular studies. In this paper, we presented how well the text mining concepts can be used to analyze cellulase producing bacteria and fungi, their comparative structures are also studied with the aid of well-establised, high quality standard bioinformatic tools such as Bioedit, Swissport, Protparam, EMBOSSwin with which a complete data on Cellulases like structure, constituents of the enzyme has been obtained.

  11. Can abstract screening workload be reduced using text mining? User experiences of the tool Rayyan.

    Science.gov (United States)

    Olofsson, Hanna; Brolund, Agneta; Hellberg, Christel; Silverstein, Rebecca; Stenström, Karin; Österberg, Marie; Dagerhamn, Jessica

    2017-09-01

    One time-consuming aspect of conducting systematic reviews is the task of sifting through abstracts to identify relevant studies. One promising approach for reducing this burden uses text mining technology to identify those abstracts that are potentially most relevant for a project, allowing those abstracts to be screened first. To examine the effectiveness of the text mining functionality of the abstract screening tool Rayyan. User experiences were collected. Rayyan was used to screen abstracts for 6 reviews in 2015. After screening 25%, 50%, and 75% of the abstracts, the screeners logged the relevant references identified. A survey was sent to users. After screening half of the search result with Rayyan, 86% to 99% of the references deemed relevant to the study were identified. Of those studies included in the final reports, 96% to 100% were already identified in the first half of the screening process. Users rated Rayyan 4.5 out of 5. The text mining function in Rayyan successfully helped reviewers identify relevant studies early in the screening process. Copyright © 2017 John Wiley & Sons, Ltd.

  12. Text mining for literature review and knowledge discovery in cancer risk assessment and research.

    Directory of Open Access Journals (Sweden)

    Anna Korhonen

    Full Text Available Research in biomedical text mining is starting to produce technology which can make information in biomedical literature more accessible for bio-scientists. One of the current challenges is to integrate and refine this technology to support real-life scientific tasks in biomedicine, and to evaluate its usefulness in the context of such tasks. We describe CRAB - a fully integrated text mining tool designed to support chemical health risk assessment. This task is complex and time-consuming, requiring a thorough review of existing scientific data on a particular chemical. Covering human, animal, cellular and other mechanistic data from various fields of biomedicine, this is highly varied and therefore difficult to harvest from literature databases via manual means. Our tool automates the process by extracting relevant scientific data in published literature and classifying it according to multiple qualitative dimensions. Developed in close collaboration with risk assessors, the tool allows navigating the classified dataset in various ways and sharing the data with other users. We present a direct and user-based evaluation which shows that the technology integrated in the tool is highly accurate, and report a number of case studies which demonstrate how the tool can be used to support scientific discovery in cancer risk assessment and research. Our work demonstrates the usefulness of a text mining pipeline in facilitating complex research tasks in biomedicine. We discuss further development and application of our technology to other types of chemical risk assessment in the future.

  13. TIME SERIES ANALYSIS ON STOCK MARKET FOR TEXT MINING CORRELATION OF ECONOMY NEWS

    Directory of Open Access Journals (Sweden)

    Sadi Evren SEKER

    2014-01-01

    Full Text Available This paper proposes an information retrieval methodfor the economy news. Theeffect of economy news, are researched in the wordlevel and stock market valuesare considered as the ground proof.The correlation between stock market prices and economy news is an already ad-dressed problem for most of the countries. The mostwell-known approach is ap-plying the text mining approaches to the news and some time series analysis tech-niques over stock market closing values in order toapply classification or cluster-ing algorithms over the features extracted. This study goes further and tries to askthe question what are the available time series analysis techniques for the stockmarket closing values and which one is the most suitable? In this study, the newsand their dates are collected into a database and text mining is applied over thenews, the text mining part has been kept simple with only term frequency – in-verse document frequency method. For the time series analysis part, we havestudied 10 different methods such as random walk, moving average, acceleration,Bollinger band, price rate of change, periodic average, difference, momentum orrelative strength index and their variation. In this study we have also explainedthese techniques in a comparative way and we have applied the methods overTurkish Stock Market closing values for more than a2 year period. On the otherhand, we have applied the term frequency – inversedocument frequency methodon the economy news of one of the high-circulatingnewspapers in Turkey.

  14. Adverse Event extraction from Structured Product Labels using the Event-based Text-mining of Health Electronic Records (ETHER)system.

    Science.gov (United States)

    Pandey, Abhishek; Kreimeyer, Kory; Foster, Matthew; Botsis, Taxiarchis; Dang, Oanh; Ly, Thomas; Wang, Wei; Forshee, Richard

    2018-01-01

    Structured Product Labels follow an XML-based document markup standard approved by the Health Level Seven organization and adopted by the US Food and Drug Administration as a mechanism for exchanging medical products information. Their current organization makes their secondary use rather challenging. We used the Side Effect Resource database and DailyMed to generate a comparison dataset of 1159 Structured Product Labels. We processed the Adverse Reaction section of these Structured Product Labels with the Event-based Text-mining of Health Electronic Records system and evaluated its ability to extract and encode Adverse Event terms to Medical Dictionary for Regulatory Activities Preferred Terms. A small sample of 100 labels was then selected for further analysis. Of the 100 labels, Event-based Text-mining of Health Electronic Records achieved a precision and recall of 81 percent and 92 percent, respectively. This study demonstrated Event-based Text-mining of Health Electronic Record's ability to extract and encode Adverse Event terms from Structured Product Labels which may potentially support multiple pharmacoepidemiological tasks.

  15. Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.

    Science.gov (United States)

    Ravikumar, Komandur Elayavilli; Wagholikar, Kavishwar B; Li, Dingcheng; Kocher, Jean-Pierre; Liu, Hongfang

    2015-06-06

    Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3% for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10% in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5%. Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating

  16. Cluo: Web-Scale Text Mining System For Open Source Intelligence Purposes

    Directory of Open Access Journals (Sweden)

    Przemyslaw Maciolek

    2013-01-01

    Full Text Available The amount of textual information published on the Internet is considered tobe in billions of web pages, blog posts, comments, social media updates andothers. Analyzing such quantities of data requires high level of distribution –both data and computing. This is especially true in case of complex algorithms,often used in text mining tasks.The paper presents a prototype implementation of CLUO – an Open SourceIntelligence (OSINT system, which extracts and analyzes significant quantitiesof openly available information.

  17. Appraising the Corporate Sustainability Reports - Text Mining and Multi-Discriminatory Analysis

    Science.gov (United States)

    Modapothala, J. R.; Issac, B.; Jayamani, E.

    The voluntary disclosure of the sustainability reports by the companies attracts wider stakeholder groups. Diversity in these reports poses challenge to the users of information and regulators. This study appraises the corporate sustainability reports as per GRI (Global Reporting Initiative) guidelines (the most widely accepted and used) across all industrial sectors. Text mining is adopted to carry out the initial analysis with a large sample size of 2650 reports. Statistical analyses were performed for further investigation. The results indicate that the disclosures made by the companies differ across the industrial sectors. Multivariate Discriminant Analysis (MDA) shows that the environmental variable is a greater significant contributing factor towards explanation of sustainability report.

  18. Web services-based text-mining demonstrates broad impacts for interoperability and process simplification.

    Science.gov (United States)

    Wiegers, Thomas C; Davis, Allan Peter; Mattingly, Carolyn J

    2014-01-01

    The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and

  19. tmBioC: improving interoperability of text-mining tools with BioC.

    Science.gov (United States)

    Khare, Ritu; Wei, Chih-Hsuan; Mao, Yuqing; Leaman, Robert; Lu, Zhiyong

    2014-01-01

    The lack of interoperability among biomedical text-mining tools is a major bottleneck in creating more complex applications. Despite the availability of numerous methods and techniques for various text-mining tasks, combining different tools requires substantial efforts and time owing to heterogeneity and variety in data formats. In response, BioC is a recent proposal that offers a minimalistic approach to tool interoperability by stipulating minimal changes to existing tools and applications. BioC is a family of XML formats that define how to present text documents and annotations, and also provides easy-to-use functions to read/write documents in the BioC format. In this study, we introduce our text-mining toolkit, which is designed to perform several challenging and significant tasks in the biomedical domain, and repackage the toolkit into BioC to enhance its interoperability. Our toolkit consists of six state-of-the-art tools for named-entity recognition, normalization and annotation (PubTator) of genes (GenNorm), diseases (DNorm), mutations (tmVar), species (SR4GN) and chemicals (tmChem). Although developed within the same group, each tool is designed to process input articles and output annotations in a different format. We modify these tools and enable them to read/write data in the proposed BioC format. We find that, using the BioC family of formats and functions, only minimal changes were required to build the newer versions of the tools. The resulting BioC wrapped toolkit, which we have named tmBioC, consists of our tools in BioC, an annotated full-text corpus in BioC, and a format detection and conversion tool. Furthermore, through participation in the 2013 BioCreative IV Interoperability Track, we empirically demonstrate that the tools in tmBioC can be more efficiently integrated with each other as well as with external tools: Our experimental results show that using BioC reduces >60% in lines of code for text-mining tool integration. The tmBioC toolkit

  20. Practical text mining and statistical analysis for non-structured text data applications

    CERN Document Server

    Miner, Gary; Hill, Thomas; Nisbet, Robert; Delen, Dursun

    2012-01-01

    The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase d

  1. Unsupervised text mining methods for literature analysis: a case study for Thomas Pynchon's V.

    Directory of Open Access Journals (Sweden)

    Christos Iraklis Tsatsoulis

    2013-08-01

    Full Text Available We investigate the use of unsupervised text mining methods for the analysis of prose literature works, using Thomas Pynchon's novel 'V'. as a case study. Our results suggest that such methods may be employed to reveal meaningful information regarding the novel’s structure. We report results using a wide variety of clustering algorithms, several distinct distance functions, and different visualization techniques. The application of a simple topic model is also demonstrated. We discuss the meaningfulness of our results along with the limitations of our approach, and we suggest some possible paths for further study.

  2. DDMGD: the database of text-mined associations between genes methylated in diseases from different species

    KAUST Repository

    Raies, A. B.

    2014-11-14

    Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD\\'s scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases.

  3. Text Mining of the Classical Medical Literature for Medicines That Show Potential in Diabetic Nephropathy

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    2014-01-01

    Full Text Available Objectives. To apply modern text-mining methods to identify candidate herbs and formulae for the treatment of diabetic nephropathy. Methods. The method we developed includes three steps: (1 identification of candidate ancient terms; (2 systemic search and assessment of medical records written in classical Chinese; (3 preliminary evaluation of the effect and safety of candidates. Results. Ancient terms Xia Xiao, Shen Xiao, and Xiao Shen were determined as the most likely to correspond with diabetic nephropathy and used in text mining. A total of 80 Chinese formulae for treating conditions congruent with diabetic nephropathy recorded in medical books from Tang Dynasty to Qing Dynasty were collected. Sao si tang (also called Reeling Silk Decoction was chosen to show the process of preliminary evaluation of the candidates. It had promising potential for development as new agent for the treatment of diabetic nephropathy. However, further investigations about the safety to patients with renal insufficiency are still needed. Conclusions. The methods developed in this study offer a targeted approach to identifying traditional herbs and/or formulae as candidates for further investigation in the search for new drugs for modern disease. However, more effort is still required to improve our techniques, especially with regard to compound formulae.

  4. Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications.

    Science.gov (United States)

    Vazquez, Miguel; Krallinger, Martin; Leitner, Florian; Valencia, Alfonso

    2011-06-01

    Providing prior knowledge about biological properties of chemicals, such as kinetic values, protein targets, or toxic effects, can facilitate many aspects of drug development. Chemical information is rapidly accumulating in all sorts of free text documents like patents, industry reports, or scientific articles, which has motivated the development of specifically tailored text mining applications. Despite the potential gains, chemical text mining still faces significant challenges. One of the most salient is the recognition of chemical entities mentioned in text. To help practitioners contribute to this area, a good portion of this review is devoted to this issue, and presents the basic concepts and principles underlying the main strategies. The technical details are introduced and accompanied by relevant bibliographic references. Other tasks discussed are retrieving relevant articles, identifying relationships between chemicals and other entities, or determining the chemical structures of chemicals mentioned in text. This review also introduces a number of published applications that can be used to build pipelines in topics like drug side effects, toxicity, and protein-disease-compound network analysis. We conclude the review with an outlook on how we expect the field to evolve, discussing its possibilities and its current limitations. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. DDMGD: the database of text-mined associations between genes methylated in diseases from different species.

    Science.gov (United States)

    Bin Raies, Arwa; Mansour, Hicham; Incitti, Roberto; Bajic, Vladimir B

    2015-01-01

    Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD's scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. U-Compare: share and compare text mining tools with UIMA

    Science.gov (United States)

    Kano, Yoshinobu; Baumgartner, William A.; McCrohon, Luke; Ananiadou, Sophia; Cohen, K. Bretonnel; Hunter, Lawrence; Tsujii, Jun'ichi

    2009-01-01

    Summary: Due to the increasing number of text mining resources (tools and corpora) available to biologists, interoperability issues between these resources are becoming significant obstacles to using them effectively. UIMA, the Unstructured Information Management Architecture, is an open framework designed to aid in the construction of more interoperable tools. U-Compare is built on top of the UIMA framework, and provides both a concrete framework for out-of-the-box text mining and a sophisticated evaluation platform allowing users to run specific tools on any target text, generating both detailed statistics and instance-based visualizations of outputs. U-Compare is a joint project, providing the world's largest, and still growing, collection of UIMA-compatible resources. These resources, originally developed by different groups for a variety of domains, include many famous tools and corpora. U-Compare can be launched straight from the web, without needing to be manually installed. All U-Compare components are provided ready-to-use and can be combined easily via a drag-and-drop interface without any programming. External UIMA components can also simply be mixed with U-Compare components, without distinguishing between locally and remotely deployed resources. Availability: http://u-compare.org/ Contact: kano@is.s.u-tokyo.ac.jp PMID:19414535

  7. A text-mining system for extracting metabolic reactions from full-text articles.

    Science.gov (United States)

    Czarnecki, Jan; Nobeli, Irene; Smith, Adrian M; Shepherd, Adrian J

    2012-07-23

    Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway - metabolic pathways - has been largely neglected.Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein-protein interactions. When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task. We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein-protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed.

  8. Identifying Understudied Nuclear Reactions by Text-mining the EXFOR Experimental Nuclear Reaction Library

    Energy Technology Data Exchange (ETDEWEB)

    Hirdt, J.A. [Department of Mathematics and Computer Science, St. Joseph' s College, Patchogue, NY 11772 (United States); Brown, D.A., E-mail: dbrown@bnl.gov [National Nuclear Data Center, Brookhaven National Laboratory, Upton, NY 11973-5000 (United States)

    2016-01-15

    The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of social networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.

  9. HPIminer: A text mining system for building and visualizing human protein interaction networks and pathways.

    Science.gov (United States)

    Subramani, Suresh; Kalpana, Raja; Monickaraj, Pankaj Moses; Natarajan, Jeyakumar

    2015-04-01

    The knowledge on protein-protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer's disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Text mining for search term development in systematic reviewing: A discussion of some methods and challenges.

    Science.gov (United States)

    Stansfield, Claire; O'Mara-Eves, Alison; Thomas, James

    2017-09-01

    Using text mining to aid the development of database search strings for topics described by diverse terminology has potential benefits for systematic reviews; however, methods and tools for accomplishing this are poorly covered in the research methods literature. We briefly review the literature on applications of text mining for search term development for systematic reviewing. We found that the tools can be used in 5 overarching ways: improving the precision of searches; identifying search terms to improve search sensitivity; aiding the translation of search strategies across databases; searching and screening within an integrated system; and developing objectively derived search strategies. Using a case study and selected examples, we then reflect on the utility of certain technologies (term frequency-inverse document frequency and Termine, term frequency, and clustering) in improving the precision and sensitivity of searches. Challenges in using these tools are discussed. The utility of these tools is influenced by the different capabilities of the tools, the way the tools are used, and the text that is analysed. Increased awareness of how the tools perform facilitates the further development of methods for their use in systematic reviews. Copyright © 2017 John Wiley & Sons, Ltd.

  11. PathText: a text mining integrator for biological pathway visualizations

    Science.gov (United States)

    Kemper, Brian; Matsuzaki, Takuya; Matsuoka, Yukiko; Tsuruoka, Yoshimasa; Kitano, Hiroaki; Ananiadou, Sophia; Tsujii, Jun'ichi

    2010-01-01

    Motivation: Metabolic and signaling pathways are an increasingly important part of organizing knowledge in systems biology. They serve to integrate collective interpretations of facts scattered throughout literature. Biologists construct a pathway by reading a large number of articles and interpreting them as a consistent network, but most of the models constructed currently lack direct links to those articles. Biologists who want to check the original articles have to spend substantial amounts of time to collect relevant articles and identify the sections relevant to the pathway. Furthermore, with the scientific literature expanding by several thousand papers per week, keeping a model relevant requires a continuous curation effort. In this article, we present a system designed to integrate a pathway visualizer, text mining systems and annotation tools into a seamless environment. This will enable biologists to freely move between parts of a pathway and relevant sections of articles, as well as identify relevant papers from large text bases. The system, PathText, is developed by Systems Biology Institute, Okinawa Institute of Science and Technology, National Centre for Text Mining (University of Manchester) and the University of Tokyo, and is being used by groups of biologists from these locations. Contact: brian@monrovian.com. PMID:20529930

  12. Identifying Understudied Nuclear Reactions by Text-mining the EXFOR Experimental Nuclear Reaction Library

    International Nuclear Information System (INIS)

    Hirdt, J.A.; Brown, D.A.

    2016-01-01

    The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of social networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.

  13. Analysis of Protein Phosphorylation and Its Functional Impact on Protein-Protein Interactions via Text Mining of the Scientific Literature.

    Science.gov (United States)

    Wang, Qinghua; Ross, Karen E; Huang, Hongzhan; Ren, Jia; Li, Gang; Vijay-Shanker, K; Wu, Cathy H; Arighi, Cecilia N

    2017-01-01

    Post-translational modifications (PTMs) are one of the main contributors to the diversity of proteoforms in the proteomic landscape. In particular, protein phosphorylation represents an essential regulatory mechanism that plays a role in many biological processes. Protein kinases, the enzymes catalyzing this reaction, are key participants in metabolic and signaling pathways. Their activation or inactivation dictate downstream events: what substrates are modified and their subsequent impact (e.g., activation state, localization, protein-protein interactions (PPIs)). The biomedical literature continues to be the main source of evidence for experimental information about protein phosphorylation. Automatic methods to bring together phosphorylation events and phosphorylation-dependent PPIs can help to summarize the current knowledge and to expose hidden connections. In this chapter, we demonstrate two text mining tools, RLIMS-P and eFIP, for the retrieval and extraction of kinase-substrate-site data and phosphorylation-dependent PPIs from the literature. These tools offer several advantages over a literature search in PubMed as their results are specific for phosphorylation. RLIMS-P and eFIP results can be sorted, organized, and viewed in multiple ways to answer relevant biological questions, and the protein mentions are linked to UniProt identifiers.

  14. What Online Communities Can Tell Us About Electronic Cigarettes and Hookah Use: A Study Using Text Mining and Visualization Techniques.

    Science.gov (United States)

    Chen, Annie T; Zhu, Shu-Hong; Conway, Mike

    2015-09-29

    The rise in popularity of electronic cigarettes (e-cigarettes) and hookah over recent years has been accompanied by some confusion and uncertainty regarding the development of an appropriate regulatory response towards these emerging products. Mining online discussion content can lead to insights into people's experiences, which can in turn further our knowledge of how to address potential health implications. In this work, we take a novel approach to understanding the use and appeal of these emerging products by applying text mining techniques to compare consumer experiences across discussion forums. This study examined content from the websites Vapor Talk, Hookah Forum, and Reddit to understand people's experiences with different tobacco products. Our investigation involves three parts. First, we identified contextual factors that inform our understanding of tobacco use behaviors, such as setting, time, social relationships, and sensory experience, and compared the forums to identify the ones where content on these factors is most common. Second, we compared how the tobacco use experience differs with combustible cigarettes and e-cigarettes. Third, we investigated differences between e-cigarette and hookah use. In the first part of our study, we employed a lexicon-based extraction approach to estimate prevalence of contextual factors, and then we generated a heat map based on these estimates to compare the forums. In the second and third parts of the study, we employed a text mining technique called topic modeling to identify important topics and then developed a visualization, Topic Bars, to compare topic coverage across forums. In the first part of the study, we identified two forums, Vapor Talk Health & Safety and the Stopsmoking subreddit, where discussion concerning contextual factors was particularly common. The second part showed that the discussion in Vapor Talk Health & Safety focused on symptoms and comparisons of combustible cigarettes and e

  15. Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

    Science.gov (United States)

    Singhal, Ayush; Simmons, Michael; Lu, Zhiyong

    2016-11-01

    The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient's genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer's disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease

  16. Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

    Directory of Open Access Journals (Sweden)

    Ayush Singhal

    2016-11-01

    Full Text Available The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient's genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed. Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer's disease, hemochromatosis, age-related macular degeneration (AMD, diabetes mellitus, and cystic fibrosis. We then evaluate our approach in two ways: (1 a direct comparison with the state of the art using benchmark datasets; (2 a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79 over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB, we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets

  17. Research trends on Big Data in Marketing: A text mining and topic modeling based literature analysis

    Directory of Open Access Journals (Sweden)

    Alexandra Amado

    2018-01-01

    Full Text Available Given the research interest on Big Data in Marketing, we present a research literature analysis based on a text mining semi-automated approach with the goal of identifying the main trends in this domain. In particular, the analysis focuses on relevant terms and topics related with five dimensions: Big Data, Marketing, Geographic location of authors’ affiliation (countries and continents, Products, and Sectors. A total of 1560 articles published from 2010 to 2015 were scrutinized. The findings revealed that research is bipartite between technological and research domains, with Big Data publications not clearly aligning cutting edge techniques toward Marketing benefits. Also, few inter-continental co-authored publications were found. Moreover, findings show that research in Big Data applications to Marketing is still in an embryonic stage, thus making it essential to develop more direct efforts toward business for Big Data to thrive in the Marketing arena.

  18. Interactive text mining with Pipeline Pilot: a bibliographic web-based tool for PubMed.

    Science.gov (United States)

    Vellay, S G P; Latimer, N E Miller; Paillard, G

    2009-06-01

    Text mining has become an integral part of all research in the medical field. Many text analysis software platforms support particular use cases and only those. We show an example of a bibliographic tool that can be used to support virtually any use case in an agile manner. Here we focus on a Pipeline Pilot web-based application that interactively analyzes and reports on PubMed search results. This will be of interest to any scientist to help identify the most relevant papers in a topical area more quickly and to evaluate the results of query refinement. Links with Entrez databases help both the biologist and the chemist alike. We illustrate this application with Leishmaniasis, a neglected tropical disease, as a case study.

  19. Internet of Things in Health Trends Through Bibliometrics and Text Mining.

    Science.gov (United States)

    Konstantinidis, Stathis Th; Billis, Antonis; Wharrad, Heather; Bamidis, Panagiotis D

    2017-01-01

    Recently a new buzzword has slowly but surely emerged, namely the Internet of Things (IoT). The importance of IoT is identified worldwide both by organisations and governments and the scientific community with an incremental number of publications during the last few years. IoT in Health is one of the main pillars of this evolution, but limited research has been performed on future visions and trends. Thus, in this study we investigate the longitudinal trends of Internet of Things in Health through bibliometrics and use of text mining. Seven hundred seventy eight (778) articles were retrieved form The Web of Science database from 1998 to 2016. The publications are grouped into thirty (30) clusters based on abstract text analysis resulting into some eight (8) trends of IoT in Health. Research in this field is obviously obtaining a worldwide character with specific trends, which are worth delineating to be in favour of some areas.

  20. Tracing Knowledge Transfer from Universities to Industry: A Text Mining Approach

    DEFF Research Database (Denmark)

    Woltmann, Sabrina; Alkærsig, Lars

    2017-01-01

    This paper identifies transferred knowledge between universities and the industry by proposing the use of a computational linguistic method. Current research on university-industry knowledge exchange relies often on formal databases and indicators such as patents, collaborative publications and l...... is the first step to enable the identification of common knowledge and knowledge transfer via text mining to increase its measurability....... and license agreements, to assess the contribution to the socioeconomic surrounding of universities. We, on the other hand, use the texts from university abstracts to identify university knowledge and compare them with texts from firm webpages. We use these text data to identify common key words and thereby...... identify overlapping contents among the texts. As method we use a well-established word ranking method from the field of information retrieval term frequency–inverse document frequency (TFIDF) to identify commonalities between texts from university. In examining the outcomes of the TFIDF statistic we find...

  1. The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews.

    Science.gov (United States)

    Hao, Haijing; Zhang, Kunpeng

    2016-05-10

    Many Web-based health care platforms allow patients to evaluate physicians by posting open-end textual reviews based on their experiences. These reviews are helpful resources for other patients to choose high-quality doctors, especially in countries like China where no doctor referral systems exist. Analyzing such a large amount of user-generated content to understand the voice of health consumers has attracted much attention from health care providers and health care researchers. The aim of this paper is to automatically extract hidden topics from Web-based physician reviews using text-mining techniques to examine what Chinese patients have said about their doctors and whether these topics differ across various specialties. This knowledge will help health care consumers, providers, and researchers better understand this information. We conducted two-fold analyses on the data collected from the "Good Doctor Online" platform, the largest online health community in China. First, we explored all reviews from 2006-2014 using descriptive statistics. Second, we applied the well-known topic extraction algorithm Latent Dirichlet Allocation to more than 500,000 textual reviews from over 75,000 Chinese doctors across four major specialty areas to understand what Chinese health consumers said online about their doctor visits. On the "Good Doctor Online" platform, 112,873 out of 314,624 doctors had been reviewed at least once by April 11, 2014. Among the 772,979 textual reviews, we chose to focus on four major specialty areas that received the most reviews: Internal Medicine, Surgery, Obstetrics/Gynecology and Pediatrics, and Chinese Traditional Medicine. Among the doctors who received reviews from those four medical specialties, two-thirds of them received more than two reviews and in a few extreme cases, some doctors received more than 500 reviews. Across the four major areas, the most popular topics reviewers found were the experience of finding doctors, doctors' technical

  2. Assertions of Japanese Websites for and Against Cancer Screening: a Text Mining Analysis

    Science.gov (United States)

    Okuhara, Tsuyoshi; Ishikawa, Hirono; Okada, Masahumi; Kato, Mio; Kiuchi, Takahiro

    2017-04-01

    Background: Cancer screening rates are lower in Japan than in Western countries such as the United States and the United Kingdom. While health professionals publish pro-cancer-screening messages online to encourage proactive seeking for screening, anti-screening activists use the same medium to warn readers against following guidelines. Contents of pro- and anti-cancer-screening sites may contribute to readers’ acceptance of one or the other position. We aimed to use a text-mining method to examine frequently appearing contents on sites for and against cancer screening. Methods: We conducted online searches in December 2016 using two major search engines in Japan (Google Japan and Yahoo! Japan). Targeted websites were classified as “pro”, “anti”, or “neutral” depending on their claims, with the author(s) classified as “health professional”, “mass media”, or “layperson”. Text-mining analyses were conducted, and statistical analysis was performed using the chi-square test. Results: Of the 169 websites analyzed, the top-three most frequently appearing content topics in pro sites were reducing mortality via cancer screening, benefits of early detection, and recommendations for obtaining detailed examination. The top three most frequent in anti-sites were harm from radiation exposure, non-efficacy of cancer screening, and lack of necessity of early detection. Anti-sites also frequently referred to a well-known Japanese radiologist, Makoto Kondo, who rejects the standard forms of cancer care. Conclusion: Our findings should enable authors of pro-cancer-screening sites to write to counter misleading anti-cancer-screening messages and facilitate dissemination of accurate information. Creative Commons Attribution License

  3. Reproducibility of studies on text mining for citation screening in systematic reviews: Evaluation and checklist.

    Science.gov (United States)

    Olorisade, Babatunde Kazeem; Brereton, Pearl; Andras, Peter

    2017-09-01

    Independent validation of published scientific results through study replication is a pre-condition for accepting the validity of such results. In computation research, full replication is often unrealistic for independent results validation, therefore, study reproduction has been justified as the minimum acceptable standard to evaluate the validity of scientific claims. The application of text mining techniques to citation screening in the context of systematic literature reviews is a relatively young and growing computational field with high relevance for software engineering, medical research and other fields. However, there is little work so far on reproduction studies in the field. In this paper, we investigate the reproducibility of studies in this area based on information contained in published articles and we propose reporting guidelines that could improve reproducibility. The study was approached in two ways. Initially we attempted to reproduce results from six studies, which were based on the same raw dataset. Then, based on this experience, we identified steps considered essential to successful reproduction of text mining experiments and characterized them to measure how reproducible is a study given the information provided on these steps. 33 articles were systematically assessed for reproducibility using this approach. Our work revealed that it is currently difficult if not impossible to independently reproduce the results published in any of the studies investigated. The lack of information about the datasets used limits reproducibility of about 80% of the studies assessed. Also, information about the machine learning algorithms is inadequate in about 27% of the papers. On the plus side, the third party software tools used are mostly free and available. The reproducibility potential of most of the studies can be significantly improved if more attention is paid to information provided on the datasets used, how they were partitioned and utilized, and

  4. From university research to innovation: Detecting knowledge transfer via text mining

    Energy Technology Data Exchange (ETDEWEB)

    Woltmann, S.; Clemmensen, L.; Alkærsig, L

    2016-07-01

    Knowledge transfer by universities is a top priority in innovation policy and a primary purpose for public research funding, due to being an important driver of technical change and innovation. Current empirical research on the impact of university research relies mainly on formal databases and indicators such as patents, collaborative publications and license agreements, to assess the contribution to the socioeconomic surrounding of universities. In this study, we present an extension of the current empirical framework by applying new computational methods, namely text mining and pattern recognition. Text samples for this purpose can include files containing social media contents, company websites and annual reports. The empirical focus in the present study is on the technical sciences and in particular on the case of the Technical University of Denmark (DTU). We generated two independent text collections (corpora) to identify correlations of university publications and company webpages. One corpus representing the company sites, serving as sample of the private economy and a second corpus, providing the reference to the university research, containing relevant publications. We associated the former with the latter to obtain insights into possible text and semantic relatedness. The text mining methods are extrapolating the correlations, semantic patterns and content comparison of the two corpora to define the document relatedness. We expect the development of a novel tool using contemporary techniques for the measurement of public research impact. The approach aims to be applicable across universities and thus enable a more holistic comparable assessment. This rely less on formal databases, which is certainly beneficial in terms of the data reliability. We seek to provide a supplementary perspective for the detection of the dissemination of university research and hereby enable policy makers to gain additional insights of (informal) contributions of knowledge

  5. A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.

    Science.gov (United States)

    Westergaard, David; Stærfeldt, Hans-Henrik; Tønsberg, Christian; Jensen, Lars Juhl; Brunak, Søren

    2018-02-01

    Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.

  6. A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

    Science.gov (United States)

    Westergaard, David; Stærfeldt, Hans-Henrik

    2018-01-01

    Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823–2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein–protein, disease–gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only. PMID:29447159

  7. A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

    DEFF Research Database (Denmark)

    Westergaard, David; Stærfeldt, Hans Henrik; Tønsberg, Christian

    2018-01-01

    Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15...... subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full...... million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein...

  8. 77 FR 71445 - Regulatory and Administrative Waivers Granted for Multifamily Housing Programs To Assist With...

    Science.gov (United States)

    2012-11-30

    ... DEPARTMENT OF HOUSING AND URBAN DEVELOPMENT [Docket No. 5677-N-01] Regulatory and Administrative Waivers Granted for Multifamily Housing Programs To Assist With Recovery and Relief in Sandy Disaster... in the disaster areas is widespread, and the need for regulatory relief in many areas pertaining to...

  9. Chemical Topic Modeling: Exploring Molecular Data Sets Using a Common Text-Mining Approach.

    Science.gov (United States)

    Schneider, Nadine; Fechner, Nikolas; Landrum, Gregory A; Stiefl, Nikolaus

    2017-08-28

    Big data is one of the key transformative factors which increasingly influences all aspects of modern life. Although this transformation brings vast opportunities it also generates novel challenges, not the least of which is organizing and searching this data deluge. The field of medicinal chemistry is not different: more and more data are being generated, for instance, by technologies such as DNA encoded libraries, peptide libraries, text mining of large literature corpora, and new in silico enumeration methods. Handling those huge sets of molecules effectively is quite challenging and requires compromises that often come at the expense of the interpretability of the results. In order to find an intuitive and meaningful approach to organizing large molecular data sets, we adopted a probabilistic framework called "topic modeling" from the text-mining field. Here we present the first chemistry-related implementation of this method, which allows large molecule sets to be assigned to "chemical topics" and investigating the relationships between those. In this first study, we thoroughly evaluate this novel method in different experiments and discuss both its disadvantages and advantages. We show very promising results in reproducing human-assigned concepts using the approach to identify and retrieve chemical series from sets of molecules. We have also created an intuitive visualization of the chemical topics output by the algorithm. This is a huge benefit compared to other unsupervised machine-learning methods, like clustering, which are commonly used to group sets of molecules. Finally, we applied the new method to the 1.6 million molecules of the ChEMBL22 data set to test its robustness and efficiency. In about 1 h we built a 100-topic model of this large data set in which we could identify interesting topics like "proteins", "DNA", or "steroids". Along with this publication we provide our data sets and an open-source implementation of the new method (CheTo) which

  10. Development of Workshops on Biodiversity and Evaluation of the Educational Effect by Text Mining Analysis

    Science.gov (United States)

    Baba, R.; Iijima, A.

    2014-12-01

    Conservation of biodiversity is one of the key issues in the environmental studies. As means to solve this issue, education is becoming increasingly important. In the previous work, we have developed a course of workshops on the conservation of biodiversity. To disseminate the course as a tool for environmental education, determination of the educational effect is essential. A text mining enables analyses of frequency and co-occurrence of words in the freely described texts. This study is intended to evaluate the effect of workshop by using text mining technique. We hosted the originally developed workshop on the conservation of biodiversity for 22 college students. The aim of the workshop was to inform the definition of biodiversity. Generally, biodiversity refers to the diversity of ecosystem, diversity between species, and diversity within species. To facilitate discussion, supplementary materials were used. For instance, field guides of wildlife species were used to discuss about the diversity of ecosystem. Moreover, a hierarchical framework in an ecological pyramid was shown for understanding the role of diversity between species. Besides, we offered a document material on the historical affair of Potato Famine in Ireland to discuss about the diversity within species from the genetic viewpoint. Before and after the workshop, we asked students for free description on the definition of biodiversity, and analyzed by using Tiny Text Miner. This technique enables Japanese language morphological analysis. Frequently-used words were sorted into some categories. Moreover, a principle component analysis was carried out. After the workshop, frequency of the words tagged to diversity between species and diversity within species has significantly increased. From a principle component analysis, the 1st component consists of the words such as producer, consumer, decomposer, and food chain. This indicates that the students have comprehended the close relationship between

  11. Sustainable Supply Chain Based on News Articles and Sustainability Reports: Text Mining with Leximancer and DICTION

    Directory of Open Access Journals (Sweden)

    Dongwook Kim

    2017-06-01

    Full Text Available The purpose of this research is to explore sustainable supply chain management (SSCM trends, and firms’ strategic positioning and execution with regard to sustainability in the textile and apparel industry based on news articles and sustainability reports. Further analysis of the rhetoric in Chief executive officer (CEO letters within sustainability reports is used to determine firms’ resoluteness, positive entailments, sharing of values, perception of reality, and sustainability strategy and execution feasibility. Computer-based content analysis is used for this research: Leximancer is applied for text analysis, while dictionary-based text mining program DICTION and SPSS are used for rhetorical analysis. Overall, contents similar to the literature on environmental, social, and economic aspects of the triple bottom line (TBL are observed, however, topics such as regulation, green incentives, and international standards are not readily observed. Furthmore, ethical issues, sustainable production, quality, and customer roles are emphasized in texts analyzed. The CEO letter analysis indicates that listed firms show relatively low realism and high commonality, while North American firms exhibit relatively high commonality, and Europe firms show relatively high realism. The results will serve as a baseline for providing academia guidelines in SSCM research, and provide an opportunity for businesses to complement their sustainability strategies and executions.

  12. Data Processing and Text Mining Technologies on Electronic Medical Records: A Review

    Directory of Open Access Journals (Sweden)

    Wencheng Sun

    2018-01-01

    Full Text Available Currently, medical institutes generally use EMR to record patient’s condition, including diagnostic information, procedures performed, and treatment results. EMR has been recognized as a valuable resource for large-scale analysis. However, EMR has the characteristics of diversity, incompleteness, redundancy, and privacy, which make it difficult to carry out data mining and analysis directly. Therefore, it is necessary to preprocess the source data in order to improve data quality and improve the data mining results. Different types of data require different processing technologies. Most structured data commonly needs classic preprocessing technologies, including data cleansing, data integration, data transformation, and data reduction. For semistructured or unstructured data, such as medical text, containing more health information, it requires more complex and challenging processing methods. The task of information extraction for medical texts mainly includes NER (named-entity recognition and RE (relation extraction. This paper focuses on the process of EMR processing and emphatically analyzes the key techniques. In addition, we make an in-depth study on the applications developed based on text mining together with the open challenges and research issues for future work.

  13. Natural products for chronic cough: Text mining the East Asian historical literature for future therapeutics.

    Science.gov (United States)

    Shergis, Johannah Linda; Wu, Lei; May, Brian H; Zhang, Anthony Lin; Guo, Xinfeng; Lu, Chuanjian; Xue, Charlie Changli

    2015-08-01

    Chronic cough is a significant health burden. Patients experience variable benefits from over the counter and prescribed products, but there is an unmet need to provide more effective treatments. Natural products have been used to treat cough and some plant compounds such as pseudoephedrine from ephedra and codeine from opium poppy have been developed into drugs. Text mining historical literature may offer new insight for future therapeutic development. We identified natural products used in the East Asian historical literature to treat chronic cough. Evaluation of the historical literature revealed 331 natural products used to treat chronic cough. Products included plants, minerals and animal substances. These natural products were found in 75 different books published between AD 363 and 1911. Of the 331 products, the 10 most frequently and continually used products were examined, taking into consideration findings from contemporary experimental studies. The natural products identified are promising and offer new directions in therapeutic development for treating chronic cough. © The Author(s) 2015.

  14. A method for extracting design rationale knowledge based on Text Mining

    Directory of Open Access Journals (Sweden)

    Liu Jihong

    2017-01-01

    Full Text Available Capture design rationale (DR knowledge and presenting it to designers by good form, which have great significance for design reuse and design innovation. Since the 1970s design rationality began to develop, many teams have developed their own design rational system. However, the DR acquisition system is not intelligent enough, and it still requires designers to do a lot of operations. In addition, the existing design documents contain a large number of DR knowledge, but it has not been well excavated. Therefore, a method and system are needed to better extract DR knowledge in design documents. We have proposed a DRKH (design rationale knowledge hierarchy model for DR representation. The DRKH model has three layers, respectively as design intent layer, design decision layer and design basis layer. In this paper, we use text mining method to extract DR from design documents and construct DR model. Finally, the welding robot design specification is taken as an example to demonstrate the system interface.

  15. A methodology for semiautomatic taxonomy of concepts extraction from nuclear scientific documents using text mining techniques

    International Nuclear Information System (INIS)

    Braga, Fabiane dos Reis

    2013-01-01

    This thesis presents a text mining method for semi-automatic extraction of taxonomy of concepts, from a textual corpus composed of scientific papers related to nuclear area. The text classification is a natural human practice and a crucial task for work with large repositories. The document clustering technique provides a logical and understandable framework that facilitates the organization, browsing and searching. Most clustering algorithms using the bag of words model to represent the content of a document. This model generates a high dimensionality of the data, ignores the fact that different words can have the same meaning and does not consider the relationship between them, assuming that words are independent of each other. The methodology presents a combination of a model for document representation by concepts with a hierarchical document clustering method using frequency of co-occurrence concepts and a technique for clusters labeling more representatives, with the objective of producing a taxonomy of concepts which may reflect a structure of the knowledge domain. It is hoped that this work will contribute to the conceptual mapping of scientific production of nuclear area and thus support the management of research activities in this area. (author)

  16. Community challenges in biomedical text mining over 10 years: success, failure and the future.

    Science.gov (United States)

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    One effective way to improve the state of the art is through competitions. Following the success of the Critical Assessment of protein Structure Prediction (CASP) in bioinformatics research, a number of challenge evaluations have been organized by the text-mining research community to assess and advance natural language processing (NLP) research for biomedicine. In this article, we review the different community challenge evaluations held from 2002 to 2014 and their respective tasks. Furthermore, we examine these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively. Next, we describe the general workflow of organizing a Biomedical NLP (BioNLP) challenge and involved stakeholders (task organizers, task data producers, task participants and end users). Finally, we summarize the impact and contributions by taking into account different BioNLP challenges as a whole, followed by a discussion of their limitations and difficulties. We conclude with future trends in BioNLP challenge evaluations. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.

  17. Whole field tendencies in transcranial magnetic stimulation: A systematic review with data and text mining.

    Science.gov (United States)

    Dias, Alvaro Machado; Mansur, Carlos Gustavo; Myczkowski, Martin; Marcolin, Marco

    2011-06-01

    Transcranial magnetic stimulation (TMS) has played an important role in the fields of psychiatry, neurology and neuroscience, since its emergence in the mid-1980s; and several high quality reviews have been produced since then. Most high quality reviews serve as powerful tools in the evaluation of predefined tendencies, but they cannot actually uncover new trends within the literature. However, special statistical procedures to 'mine' the literature have been developed which aid in achieving such a goal. This paper aims to uncover patterns within the literature on TMS as a whole, as well as specific trends in the recent literature on TMS for the treatment of depression. Data mining and text mining. Currently there are 7299 publications, which can be clustered in four essential themes. Considering the frequency of the core psychiatric concepts within the indexed literature, the main results are: depression is present in 13.5% of the publications; Parkinson's disease in 2.94%; schizophrenia in 2.76%; bipolar disorder in 0.158%; and anxiety disorder in 0.142% of all the publications indexed in PubMed. Several other perspectives are discussed in the article. Copyright © 2011 Elsevier B.V. All rights reserved.

  18. Mining concepts of health responsibility using text mining and exploratory graph analysis.

    Science.gov (United States)

    Kjellström, Sofia; Golino, Hudson

    2018-05-24

    Occupational therapists need to know about people's beliefs about personal responsibility for health to help them pursue everyday activities. The study aims to employ state-of-the-art quantitative approaches to understand people's views of health and responsibility at different ages. A mixed method approach was adopted, using text mining to extract information from 233 interviews with participants aged 5 to 96 years, and then exploratory graph analysis to estimate the number of latent variables. The fit of the structure estimated via the exploratory graph analysis was verified using confirmatory factor analysis. Exploratory graph analysis estimated three dimensions of health responsibility: (1) creating good health habits and feeling good; (2) thinking about one's own health and wanting to improve it; and 3) adopting explicitly normative attitudes to take care of one's health. The comparison between the three dimensions among age groups showed, in general, that children and adolescents, as well as the old elderly (>73 years old) expressed ideas about personal responsibility for health less than young adults, adults and young elderly. Occupational therapists' knowledge of the concepts of health responsibility is of value when working with a patient's health, but an identified challenge is how to engage children and older persons.

  19. New challenges for text mining: mapping between text and manually curated pathways

    Science.gov (United States)

    Oda, Kanae; Kim, Jin-Dong; Ohta, Tomoko; Okanohara, Daisuke; Matsuzaki, Takuya; Tateisi, Yuka; Tsujii, Jun'ichi

    2008-01-01

    Background Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactions, and (3) the formulation and implementation of required inferences based on biological domain knowledge. Results To address these challenges, we constructed new resources to link the text with a model pathway; they are: the GENIA pathway corpus with event annotation and NF-kB pathway. Through their detailed analysis, we address the untapped resource, ‘bio-inference,’ as well as the differences between text and pathway representation. Here, we show the precise comparisons of their representations and the nine classes of ‘bio-inference’ schemes observed in the pathway corpus. Conclusions We believe that the creation of such rich resources and their detailed analysis is the significant first step for accelerating the research of the automatic construction of pathway from text. PMID:18426550

  20. [Exploring the clinical characters of Shugan Jieyu capsule through text mining].

    Science.gov (United States)

    Pu, Zheng-Ping; Xia, Jiang-Ming; Xie, Wei; He, Jin-Cai

    2017-09-01

    The study was main to explore the clinical characters of Shugan Jieyu capsule through text mining. The data sets of Shugan Jieyu capsule were downloaded from CMCC database by the method of literature retrieved from May 2009 to Jan 2016. Rules of Chinese medical patterns, diseases, symptoms and combination treatment were mined out by data slicing algorithm, and they were demonstrated in frequency tables and two dimension based network. Then totally 190 literature were recruited. The outcomess suggested that SC was most frequently correlated with liver Qi stagnation. Primary depression, depression due to brain disease, concomitant depression followed by physical diseases, concomitant depression followed by schizophrenia and functional dyspepsia were main diseases treated by Shugan Jieyu capsule. Symptoms like low mood, psychic anxiety, somatic anxiety and dysfunction of automatic nerve were mainy relieved bv Shugan Jieyu capsule.For combination treatment. Shugan Jieyu capsule was most commonly used with paroxetine, sertraline and fluoxetine. The research suggested that syndrome types and mining results of Shugan Jieyu capsule were almost the same as its instructions. Syndrome of malnutrition of heart spirit was the potential Chinese medical pattern of Shugan Jieyu capsule. Primary comorbid anxiety and depression, concomitant comorbid anxiety and depression followed by physical diseases, and postpartum depression were potential diseases treated by Shugan Jieyu capsule.For combination treatment, Shugan Jieyu capsule was most commonly used with paroxetine, sertraline and fluoxetine. Copyright© by the Chinese Pharmaceutical Association.

  1. Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends.

    Science.gov (United States)

    Jurca, Gabriela; Addam, Omar; Aksac, Alper; Gao, Shang; Özyer, Tansel; Demetrick, Douglas; Alhajj, Reda

    2016-04-26

    Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer. We utilized PubMed for the testing. We investigated gene-gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries. Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene-gene relations and gene functions.

  2. An unsupervised text mining method for relation extraction from biomedical literature.

    Directory of Open Access Journals (Sweden)

    Changqin Quan

    Full Text Available The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1 Protein-protein interactions extraction, and (2 Gene-suicide association extraction. The evaluation of task (1 on the benchmark dataset (AImed corpus showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene-suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.

  3. Text mining analysis of public comments regarding high-level radioactive waste disposal

    International Nuclear Information System (INIS)

    Kugo, Akihide; Yoshikawa, Hidekazu; Shimoda, Hiroshi; Wakabayashi, Yasunaga

    2005-01-01

    In order to narrow the risk perception gap as seen in social investigations between the general public and people who are involved in nuclear industry, public comments on high-level radioactive waste (HLW) disposal have been conducted to find the significant talking points with the general public for constructing an effective risk communication model of social risk information regarding HLW disposal. Text mining was introduced to examine public comments to identify the core public interest underlying the comments. The utilized test mining method is to cluster specific groups of words with negative meanings and then to analyze public understanding by employing text structural analysis to extract words from subjective expressions. Using these procedures, it was found that the public does not trust the nuclear fuel cycle promotion policy and shows signs of anxiety about the long-lasting technological reliability of waste storage. To develop effective social risk communication of HLW issues, these findings are expected to help experts in the nuclear industry to communicate with the general public more effectively to obtain their trust. (author)

  4. The Application of Machine Learning Algorithms for Text Mining based on Sentiment Analysis Approach

    Directory of Open Access Journals (Sweden)

    Reza Samizade

    2018-06-01

    Full Text Available Classification of the cyber texts and comments into two categories of positive and negative sentiment among social media users is of high importance in the research are related to text mining. In this research, we applied supervised classification methods to classify Persian texts based on sentiment in cyber space. The result of this research is in a form of a system that can decide whether a comment which is published in cyber space such as social networks is considered positive or negative. The comments that are published in Persian movie and movie review websites from 1392 to 1395 are considered as the data set for this research. A part of these data are considered as training and others are considered as testing data. Prior to implementing the algorithms, pre-processing activities such as tokenizing, removing stop words, and n-germs process were applied on the texts. Naïve Bayes, Neural Networks and support vector machine were used for text classification in this study. Out of sample tests showed that there is no evidence indicating that the accuracy of SVM approach is statistically higher than Naïve Bayes or that the accuracy of Naïve Bayes is not statistically higher than NN approach. However, the researchers can conclude that the accuracy of the classification using SVM approach is statistically higher than the accuracy of NN approach in 5% confidence level.

  5. PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.

    Science.gov (United States)

    Döring, Kersten; Grüning, Björn A; Telukunta, Kiran K; Thomas, Philippe; Günther, Stefan

    2016-01-01

    Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools. Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files. The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user's system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering/PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects.

  6. EnvMine: A text-mining system for the automatic extraction of contextual information

    Directory of Open Access Journals (Sweden)

    de Lorenzo Victor

    2010-06-01

    Full Text Available Abstract Background For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles. So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations from textual sources of any kind. Results EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings. Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude, thus allowing the calculation of distance between the individual locations. Conclusion EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical

  7. Practice-based evidence: profiling the safety of cilostazol by text-mining of clinical notes.

    Science.gov (United States)

    Leeper, Nicholas J; Bauer-Mehren, Anna; Iyer, Srinivasan V; Lependu, Paea; Olson, Cliff; Shah, Nigam H

    2013-01-01

    Peripheral arterial disease (PAD) is a growing problem with few available therapies. Cilostazol is the only FDA-approved medication with a class I indication for intermittent claudication, but carries a black box warning due to concerns for increased cardiovascular mortality. To assess the validity of this black box warning, we employed a novel text-analytics pipeline to quantify the adverse events associated with Cilostazol use in a clinical setting, including patients with congestive heart failure (CHF). We analyzed the electronic medical records of 1.8 million subjects from the Stanford clinical data warehouse spanning 18 years using a novel text-mining/statistical analytics pipeline. We identified 232 PAD patients taking Cilostazol and created a control group of 1,160 PAD patients not taking this drug using 1:5 propensity-score matching. Over a mean follow up of 4.2 years, we observed no association between Cilostazol use and any major adverse cardiovascular event including stroke (OR = 1.13, CI [0.82, 1.55]), myocardial infarction (OR = 1.00, CI [0.71, 1.39]), or death (OR = 0.86, CI [0.63, 1.18]). Cilostazol was not associated with an increase in any arrhythmic complication. We also identified a subset of CHF patients who were prescribed Cilostazol despite its black box warning, and found that it did not increase mortality in this high-risk group of patients. This proof of principle study shows the potential of text-analytics to mine clinical data warehouses to uncover 'natural experiments' such as the use of Cilostazol in CHF patients. We envision this method will have broad applications for examining difficult to test clinical hypotheses and to aid in post-marketing drug safety surveillance. Moreover, our observations argue for a prospective study to examine the validity of a drug safety warning that may be unnecessarily limiting the use of an efficacious therapy.

  8. Practice-based evidence: profiling the safety of cilostazol by text-mining of clinical notes.

    Directory of Open Access Journals (Sweden)

    Nicholas J Leeper

    Full Text Available Peripheral arterial disease (PAD is a growing problem with few available therapies. Cilostazol is the only FDA-approved medication with a class I indication for intermittent claudication, but carries a black box warning due to concerns for increased cardiovascular mortality. To assess the validity of this black box warning, we employed a novel text-analytics pipeline to quantify the adverse events associated with Cilostazol use in a clinical setting, including patients with congestive heart failure (CHF.We analyzed the electronic medical records of 1.8 million subjects from the Stanford clinical data warehouse spanning 18 years using a novel text-mining/statistical analytics pipeline. We identified 232 PAD patients taking Cilostazol and created a control group of 1,160 PAD patients not taking this drug using 1:5 propensity-score matching. Over a mean follow up of 4.2 years, we observed no association between Cilostazol use and any major adverse cardiovascular event including stroke (OR = 1.13, CI [0.82, 1.55], myocardial infarction (OR = 1.00, CI [0.71, 1.39], or death (OR = 0.86, CI [0.63, 1.18]. Cilostazol was not associated with an increase in any arrhythmic complication. We also identified a subset of CHF patients who were prescribed Cilostazol despite its black box warning, and found that it did not increase mortality in this high-risk group of patients.This proof of principle study shows the potential of text-analytics to mine clinical data warehouses to uncover 'natural experiments' such as the use of Cilostazol in CHF patients. We envision this method will have broad applications for examining difficult to test clinical hypotheses and to aid in post-marketing drug safety surveillance. Moreover, our observations argue for a prospective study to examine the validity of a drug safety warning that may be unnecessarily limiting the use of an efficacious therapy.

  9. A practical application of text mining to literature on cognitive rehabilitation and enhancement through neurostimulation.

    Science.gov (United States)

    Balan, Puiu F; Gerits, Annelies; Vanduffel, Wim

    2014-01-01

    The exponential growth in publications represents a major challenge for researchers. Many scientific domains, including neuroscience, are not yet fully engaged in exploiting large bodies of publications. In this paper, we promote the idea to partially automate the processing of scientific documents, specifically using text mining (TM), to efficiently review big corpora of publications. The "cognitive advantage" given by TM is mainly related to the automatic extraction of relevant trends from corpora of literature, otherwise impossible to analyze in short periods of time. Specifically, the benefits of TM are increased speed, quality and reproducibility of text processing, boosted by rapid updates of the results. First, we selected a set of TM-tools that allow user-friendly approaches of the scientific literature, and which could serve as a guide for researchers willing to incorporate TM in their work. Second, we used these TM-tools to obtain basic insights into the relevant literature on cognitive rehabilitation (CR) and cognitive enhancement (CE) using transcranial magnetic stimulation (TMS). TM readily extracted the diversity of TMS applications in CR and CE from vast corpora of publications, automatically retrieving trends already described in published reviews. TMS emerged as one of the important non-invasive tools that can both improve cognitive and motor functions in numerous neurological diseases and induce modulations/enhancements of many fundamental brain functions. TM also revealed trends in big corpora of publications by extracting occurrence frequency and relationships of particular subtopics. Moreover, we showed that CR and CE share research topics, both aiming to increase the brain's capacity to process information, thus supporting their integration in a larger perspective. Methodologically, despite limitations of a simple user-friendly approach, TM served well the reviewing process.

  10. A practical application of text mining to literature on cognitive rehabilitation and enhancement through neurostimulation

    Directory of Open Access Journals (Sweden)

    Puiu F Balan

    2014-09-01

    Full Text Available The exponential growth in publications represents a major challenge for researchers. Many scientific domains, including neuroscience, are not yet fully engaged in exploiting large bodies of publications. In this paper, we promote the idea to partially automate the processing of scientific documents, specifically using text mining (TM, to efficiently review big corpora of publications. The cognitive advantage given by TM is mainly related to the automatic extraction of relevant trends from corpora of literature, otherwise impossible to analyze in short periods of time. Specifically, the benefits of TM are increased speed, quality and reproducibility of text processing, boosted by rapid updates of the results. First, we selected a set of TM-tools that allow user-friendly approaches of the scientific literature, and which could serve as a guide for researchers willing to incorporate TM in their work. Second, we used these TM-tools to obtain basic insights into the relevant literature on cognitive rehabilitation (CR and cognitive enhancement (CE using transcranial magnetic stimulation (TMS. TM readily extracted the diversity of TMS applications in CR and CE from vast corpora of publications, automatically retrieving trends already described in published reviews. TMS emerged as one of the important non-invasive tools that can both improve cognitive and motor functions in numerous neurological diseases and induce modulations/enhancements of many fundamental brain functions. TM also revealed trends in big corpora of publications by extracting occurrence frequency and relationships of particular subtopics. Moreover, we showed that CR and CE share research topics, both aiming to increase the brain’s capacity to process information, thus supporting their integration in a larger perspective. Methodologically, despite limitations of a simple user-friendly approach, TM served well the reviewing process.

  11. Argo: an integrative, interactive, text mining-based workbench supporting curation

    Science.gov (United States)

    Rak, Rafal; Rowley, Andrew; Black, William; Ananiadou, Sophia

    2012-01-01

    Curation of biomedical literature is often supported by the automatic analysis of textual content that generally involves a sequence of individual processing components. Text mining (TM) has been used to enhance the process of manual biocuration, but has been focused on specific databases and tasks rather than an environment integrating TM tools into the curation pipeline, catering for a variety of tasks, types of information and applications. Processing components usually come from different sources and often lack interoperability. The well established Unstructured Information Management Architecture is a framework that addresses interoperability by defining common data structures and interfaces. However, most of the efforts are targeted towards software developers and are not suitable for curators, or are otherwise inconvenient to use on a higher level of abstraction. To overcome these issues we introduce Argo, an interoperable, integrative, interactive and collaborative system for text analysis with a convenient graphic user interface to ease the development of processing workflows and boost productivity in labour-intensive manual curation. Robust, scalable text analytics follow a modular approach, adopting component modules for distinct levels of text analysis. The user interface is available entirely through a web browser that saves the user from going through often complicated and platform-dependent installation procedures. Argo comes with a predefined set of processing components commonly used in text analysis, while giving the users the ability to deposit their own components. The system accommodates various areas and levels of user expertise, from TM and computational linguistics to ontology-based curation. One of the key functionalities of Argo is its ability to seamlessly incorporate user-interactive components, such as manual annotation editors, into otherwise completely automatic pipelines. As a use case, we demonstrate the functionality of an in

  12. Text Mining to inform construction of Earth and Environmental Science Ontologies

    Science.gov (United States)

    Schildhauer, M.; Adams, B.; Rebich Hespanha, S.

    2013-12-01

    There is a clear need for better semantic representation of Earth and environmental concepts, to facilitate more effective discovery and re-use of information resources relevant to scientists doing integrative research. In order to develop general-purpose Earth and environmental science ontologies, however, it is necessary to represent concepts and relationships that span usage across multiple disciplines and scientific specialties. Traditional knowledge modeling through ontologies utilizes expert knowledge but inevitably favors the particular perspectives of the ontology engineers, as well as the domain experts who interacted with them. This often leads to ontologies that lack robust coverage of synonymy, while also missing important relationships among concepts that can be extremely useful for working scientists to be aware of. In this presentation we will discuss methods we have developed that utilize statistical topic modeling on a large corpus of Earth and environmental science articles, to expand coverage and disclose relationships among concepts in the Earth sciences. For our work we collected a corpus of over 121,000 abstracts from many of the top Earth and environmental science journals. We performed latent Dirichlet allocation topic modeling on this corpus to discover a set of latent topics, which consist of terms that commonly co-occur in abstracts. We match terms in the topics to concept labels in existing ontologies to reveal gaps, and we examine which terms are commonly associated in natural language discourse, to identify relationships that are important to formally model in ontologies. Our text mining methodology uncovers significant gaps in the content of some popular existing ontologies, and we show how, through a workflow involving human interpretation of topic models, we can bootstrap ontologies to have much better coverage and richer semantics. Because we base our methods directly on what working scientists are communicating about their

  13. Regulatory RNA-assisted genome engineering in microorganisms.

    Science.gov (United States)

    Si, Tong; HamediRad, Mohammad; Zhao, Huimin

    2015-12-01

    Regulatory RNAs are increasingly recognized and utilized as key modulators of gene expression in diverse organisms. Thanks to their modular and programmable nature, trans-acting regulatory RNAs are especially attractive in genome-scale applications. Here we discuss the recent examples in microbial genome engineering implementing various trans-acting RNA platforms, including sRNA, RNAi, asRNA and CRISRP-Cas. In particular, we focus on how the scalable and multiplex nature of trans-acting RNAs has been used to tackle the challenges in creating genome-wide and combinatorial diversity for functional genomics and metabolic engineering applications. Advances in computational design and context-dependent regulation are also discussed for their contribution in improving fine-tuning capabilities of trans-acting RNAs. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease.

    Science.gov (United States)

    Small, Aeron M; Kiss, Daniel H; Zlatsin, Yevgeny; Birtwell, David L; Williams, Heather; Guerraty, Marie A; Han, Yuchi; Anwaruddin, Saif; Holmes, John H; Chirinos, Julio A; Wilensky, Robert L; Giri, Jay; Rader, Daniel J

    2017-08-01

    Interrogation of the electronic health record (EHR) using billing codes as a surrogate for diagnoses of interest has been widely used for clinical research. However, the accuracy of this methodology is variable, as it reflects billing codes rather than severity of disease, and depends on the disease and the accuracy of the coding practitioner. Systematic application of text mining to the EHR has had variable success for the detection of cardiovascular phenotypes. We hypothesize that the application of text mining algorithms to cardiovascular procedure reports may be a superior method to identify patients with cardiovascular conditions of interest. We adapted the Oracle product Endeca, which utilizes text mining to identify terms of interest from a NoSQL-like database, for purposes of searching cardiovascular procedure reports and termed the tool "PennSeek". We imported 282,569 echocardiography reports representing 81,164 individuals and 27,205 cardiac catheterization reports representing 14,567 individuals from non-searchable databases into PennSeek. We then applied clinical criteria to these reports in PennSeek to identify patients with trileaflet aortic stenosis (TAS) and coronary artery disease (CAD). Accuracy of patient identification by text mining through PennSeek was compared with ICD-9 billing codes. Text mining identified 7115 patients with TAS and 9247 patients with CAD. ICD-9 codes identified 8272 patients with TAS and 6913 patients with CAD. 4346 patients with AS and 6024 patients with CAD were identified by both approaches. A randomly selected sample of 200-250 patients uniquely identified by text mining was compared with 200-250 patients uniquely identified by billing codes for both diseases. We demonstrate that text mining was superior, with a positive predictive value (PPV) of 0.95 compared to 0.53 by ICD-9 for TAS, and a PPV of 0.97 compared to 0.86 for CAD. These results highlight the superiority of text mining algorithms applied to electronic

  15. The BioLexicon: a large-scale terminological resource for biomedical text mining

    Directory of Open Access Journals (Sweden)

    Thompson Paul

    2011-10-01

    Full Text Available Abstract Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is

  16. Text mining of cancer-related information: review of current status and future directions.

    Science.gov (United States)

    Spasić, Irena; Livsey, Jacqueline; Keane, John A; Nenadić, Goran

    2014-09-01

    This paper reviews the research literature on text mining (TM) with the aim to find out (1) which cancer domains have been the subject of TM efforts, (2) which knowledge resources can support TM of cancer-related information and (3) to what extent systems that rely on knowledge and computational methods can convert text data into useful clinical information. These questions were used to determine the current state of the art in this particular strand of TM and suggest future directions in TM development to support cancer research. A review of the research on TM of cancer-related information was carried out. A literature search was conducted on the Medline database as well as IEEE Xplore and ACM digital libraries to address the interdisciplinary nature of such research. The search results were supplemented with the literature identified through Google Scholar. A range of studies have proven the feasibility of TM for extracting structured information from clinical narratives such as those found in pathology or radiology reports. In this article, we provide a critical overview of the current state of the art for TM related to cancer. The review highlighted a strong bias towards symbolic methods, e.g. named entity recognition (NER) based on dictionary lookup and information extraction (IE) relying on pattern matching. The F-measure of NER ranges between 80% and 90%, while that of IE for simple tasks is in the high 90s. To further improve the performance, TM approaches need to deal effectively with idiosyncrasies of the clinical sublanguage such as non-standard abbreviations as well as a high degree of spelling and grammatical errors. This requires a shift from rule-based methods to machine learning following the success of similar trends in biological applications of TM. Machine learning approaches require large training datasets, but clinical narratives are not readily available for TM research due to privacy and confidentiality concerns. This issue remains the main

  17. The BioLexicon: a large-scale terminological resource for biomedical text mining

    Science.gov (United States)

    2011-01-01

    Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical

  18. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.

    Science.gov (United States)

    Alnazzawi, Noha; Thompson, Paul; Batista-Navarro, Riza; Ananiadou, Sophia

    2015-01-01

    Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single

  19. Technical Status Report of the Regulatory Assistance Project: October 2001-February 2003

    Energy Technology Data Exchange (ETDEWEB)

    2003-08-01

    This report details the work undertaken from October 2001 to February 2003 by the Regulatory Assistance Project under subcontract to the National Renewable Energy Laboratory. The objectives of this work were to develop regulatory policy options that would reduce the institutional and infrastructure barriers to full-value deployment of distributed power systems. Specific tasks included leading technical workshops on removing or overcoming regulatory barriers to distributed resources for state utility regulators and developing a draft model rule on emission performance standards for distributed generation.

  20. Regulatory framework in assisted reproductive technologies, relevance and main issues.

    OpenAIRE

    Françoise Merlet

    2010-01-01

    Assisted reproductive technologies (ART) have changed life for the past 25 years and many ethical and social issues have emerged following this new method of conception. In order to protect individuals against scientific and ethical abuses without inhibiting scientific progress, a specific legal framework is necessary. The first French law on Bioethics was voted after an extensive debate in 1994 then reviewed in 2004. This review previously scheduled every five years is currently being discus...

  1. Automated assessment of patients' self-narratives for posttraumatic stress disorder screening using natural language processing and text mining

    NARCIS (Netherlands)

    He, Qiwei; Veldkamp, Bernard P.; Glas, Cornelis A.W.; de Vries, Theo

    2017-01-01

    Patients’ narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four

  2. Text Mining for Precision Medicine: Bringing structure to EHRs and biomedical literature to understand genes and health

    Science.gov (United States)

    Simmons, Michael; Singhal, Ayush; Lu, Zhiyong

    2018-01-01

    The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text — found in biomedical publications and clinical notes — is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine. PMID:27807747

  3. The Feasibility of Using Large-Scale Text Mining to Detect Adverse Childhood Experiences in a VA-Treated Population.

    Science.gov (United States)

    Hammond, Kenric W; Ben-Ari, Alon Y; Laundry, Ryan J; Boyko, Edward J; Samore, Matthew H

    2015-12-01

    Free text in electronic health records resists large-scale analysis. Text records facts of interest not found in encoded data, and text mining enables their retrieval and quantification. The U.S. Department of Veterans Affairs (VA) clinical data repository affords an opportunity to apply text-mining methodology to study clinical questions in large populations. To assess the feasibility of text mining, investigation of the relationship between exposure to adverse childhood experiences (ACEs) and recorded diagnoses was conducted among all VA-treated Gulf war veterans, utilizing all progress notes recorded from 2000-2011. Text processing extracted ACE exposures recorded among 44.7 million clinical notes belonging to 243,973 veterans. The relationship of ACE exposure to adult illnesses was analyzed using logistic regression. Bias considerations were assessed. ACE score was strongly associated with suicide attempts and serious mental disorders (ORs = 1.84 to 1.97), and less so with behaviorally mediated and somatic conditions (ORs = 1.02 to 1.36) per unit. Bias adjustments did not remove persistent associations between ACE score and most illnesses. Text mining to detect ACE exposure in a large population was feasible. Analysis of the relationship between ACE score and adult health conditions yielded patterns of association consistent with prior research. Copyright © 2015 International Society for Traumatic Stress Studies.

  4. Impact of Text-Mining and Imitating Strategies on Lexical Richness, Lexical Diversity and General Success in Second Language Writing

    Science.gov (United States)

    Çepni, Sevcan Bayraktar; Demirel, Elif Tokdemir

    2016-01-01

    This study aimed to find out the impact of "text mining and imitating" strategies on lexical richness, lexical diversity and general success of students in their compositions in second language writing. The participants were 98 students studying their first year in Karadeniz Technical University in English Language and Literature…

  5. Text mining to decipher free-response consumer complaints: insights from the NHTSA vehicle owner's complaint database.

    Science.gov (United States)

    Ghazizadeh, Mahtab; McDonald, Anthony D; Lee, John D

    2014-09-01

    This study applies text mining to extract clusters of vehicle problems and associated trends from free-response data in the National Highway Traffic Safety Administration's vehicle owner's complaint database. As the automotive industry adopts new technologies, it is important to systematically assess the effect of these changes on traffic safety. Driving simulators, naturalistic driving data, and crash databases all contribute to a better understanding of how drivers respond to changing vehicle technology, but other approaches, such as automated analysis of incident reports, are needed. Free-response data from incidents representing two severity levels (fatal incidents and incidents involving injury) were analyzed using a text mining approach: latent semantic analysis (LSA). LSA and hierarchical clustering identified clusters of complaints for each severity level, which were compared and analyzed across time. Cluster analysis identified eight clusters of fatal incidents and six clusters of incidents involving injury. Comparisons showed that although the airbag clusters across the two severity levels have the same most frequent terms, the circumstances around the incidents differ. The time trends show clear increases in complaints surrounding the Ford/Firestone tire recall and the Toyota unintended acceleration recall. Increases in complaints may be partially driven by these recall announcements and the associated media attention. Text mining can reveal useful information from free-response databases that would otherwise be prohibitively time-consuming and difficult to summarize manually. Text mining can extend human analysis capabilities for large free-response databases to support earlier detection of problems and more timely safety interventions.

  6. An Enhanced Text-Mining Framework for Extracting Disaster Relevant Data through Social Media and Remote Sensing Data Fusion

    Science.gov (United States)

    Scheele, C. J.; Huang, Q.

    2016-12-01

    In the past decade, the rise in social media has led to the development of a vast number of social media services and applications. Disaster management represents one of such applications leveraging massive data generated for event detection, response, and recovery. In order to find disaster relevant social media data, current approaches utilize natural language processing (NLP) methods based on keywords, or machine learning algorithms relying on text only. However, these approaches cannot be perfectly accurate due to the variability and uncertainty in language used on social media. To improve current methods, the enhanced text-mining framework is proposed to incorporate location information from social media and authoritative remote sensing datasets for detecting disaster relevant social media posts, which are determined by assessing the textual content using common text mining methods and how the post relates spatiotemporally to the disaster event. To assess the framework, geo-tagged Tweets were collected for three different spatial and temporal disaster events: hurricane, flood, and tornado. Remote sensing data and products for each event were then collected using RealEarthTM. Both Naive Bayes and Logistic Regression classifiers were used to compare the accuracy within the enhanced text-mining framework. Finally, the accuracies from the enhanced text-mining framework were compared to the current text-only methods for each of the case study disaster events. The results from this study address the need for more authoritative data when using social media in disaster management applications.

  7. Examining Mobile Learning Trends 2003-2008: A Categorical Meta-Trend Analysis Using Text Mining Techniques

    Science.gov (United States)

    Hung, Jui-Long; Zhang, Ke

    2012-01-01

    This study investigated the longitudinal trends of academic articles in Mobile Learning (ML) using text mining techniques. One hundred and nineteen (119) refereed journal articles and proceedings papers from the SCI/SSCI database were retrieved and analyzed. The taxonomies of ML publications were grouped into twelve clusters (topics) and four…

  8. Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health.

    Science.gov (United States)

    Simmons, Michael; Singhal, Ayush; Lu, Zhiyong

    2016-01-01

    The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next-generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text-found in biomedical publications and clinical notes-is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine.

  9. Text Mining et annotation sémantique pour l’Information Scientifique

    OpenAIRE

    2013-01-01

    Un exemple d'application des technologies d'enrichissement sémantique et d'analyse de la littérature scientifique pour assister un travail de revue scientifique sur la place de la transgénèse dans la lutte contre les maladies des grandes cultures.

  10. Regulatory framework in assisted reproductive technologies, relevance and main issues.

    Directory of Open Access Journals (Sweden)

    Françoise Merlet

    2010-01-01

    Full Text Available Assisted reproductive technologies (ART have changed life for the past 25 years and many ethical and social issues have emerged following this new method of conception. In order to protect individuals against scientific and ethical abuses without inhibiting scientific progress, a specific legal framework is necessary. The first French law on Bioethics was voted after an extensive debate in 1994 then reviewed in 2004. This review previously scheduled every five years is currently being discussed. Legal provisions applying to ART are part of a large framework including the protection of the patients' rights and biomedical research. The key principles consist of respect for human life and ban on commercial practices of human body parts, eugenic practices and any kind of cloning. These key principles apply to ART. Donation is anonymous and free. Created in 2004, the Agence de la biomĂŠdecine is a government agency and one of the main tools of the French regulations. The missions focus on improving the quality and the safety of the management of ART. Evaluation of activities is available to all from the annual report. The agency represents the French competent authority for medical and scientific aspects of ART. Substantial differences in European legislations exist from the open-up "laissez faire" to the most restrictive one. As a consequence a large reproductive tourism has developed particularly for egg donation or surrogacy. The medical and ethical conditions of management of patients and donors represent the main critical points. In order to avoid ethical abuses, homogenization regarding the key principles is necessary in Europe. It is an opportunity to reassert that human body parts should not be a source of financial gain.

  11. Regulatory framework in assisted reproductive technologies, relevance and main issues.

    Science.gov (United States)

    Merlet, Françoise

    2009-01-01

    Assisted reproductive technologies (ART) have changed life for the past 25 years and many ethical and social issues have emerged following this new method of conception. In order to protect individuals against scientific and ethical abuses without inhibiting scientific progress, a specific legal framework is necessary. The first French law on Bioethics was voted after an extensive debate in 1994 then reviewed in 2004. This review previously scheduled every five years is currently being discussed. Legal provisions applying to ART are part of a large framework including the protection of the patients' rights and biomedical research. The key principles consist of respect for human life and ban on commercial practices of human body parts, eugenic practices and any kind of cloning. These key principles apply to ART. Donation is anonymous and free. Created in 2004, the Agence de la biomédecine is a government agency and one of the main tools of the French regulations. The missions focus on improving the quality and the safety of the management of ART. Evaluation of activities is available to all from the annual report. The agency represents the French competent authority for medical and scientific aspects of ART. Substantial differences in European legislations exist from the open-up "laissez faire" to the most restrictive one. As a consequence a large reproductive tourism has developed particularly for egg donation or surrogacy. The medical and ethical conditions of management of patients and donors represent the main critical points. In order to avoid ethical abuses, homogenization regarding the key principles is necessary in Europe. It is an opportunity to reassert that human body parts should not be a source of financial gain.

  12. The contribution of the vaccine adverse event text mining system to the classification of possible Guillain-Barré syndrome reports.

    Science.gov (United States)

    Botsis, T; Woo, E J; Ball, R

    2013-01-01

    We previously demonstrated that a general purpose text mining system, the Vaccine adverse event Text Mining (VaeTM) system, could be used to automatically classify reports of an-aphylaxis for post-marketing safety surveillance of vaccines. To evaluate the ability of VaeTM to classify reports to the Vaccine Adverse Event Reporting System (VAERS) of possible Guillain-Barré Syndrome (GBS). We used VaeTM to extract the key diagnostic features from the text of reports in VAERS. Then, we applied the Brighton Collaboration (BC) case definition for GBS, and an information retrieval strategy (i.e. the vector space model) to quantify the specific information that is included in the key features extracted by VaeTM and compared it with the encoded information that is already stored in VAERS as Medical Dictionary for Regulatory Activities (MedDRA) Preferred Terms (PTs). We also evaluated the contribution of the primary (diagnosis and cause of death) and secondary (second level diagnosis and symptoms) diagnostic VaeTM-based features to the total VaeTM-based information. MedDRA captured more information and better supported the classification of reports for GBS than VaeTM (AUC: 0.904 vs. 0.777); the lower performance of VaeTM is likely due to the lack of extraction by VaeTM of specific laboratory results that are included in the BC criteria for GBS. On the other hand, the VaeTM-based classification exhibited greater specificity than the MedDRA-based approach (94.96% vs. 87.65%). Most of the VaeTM-based information was contained in the secondary diagnostic features. For GBS, clinical signs and symptoms alone are not sufficient to match MedDRA coding for purposes of case classification, but are preferred if specificity is the priority.

  13. The Contribution of the Vaccine Adverse Event Text Mining System to the Classification of Possible Guillain-Barré Syndrome Reports

    Science.gov (United States)

    Botsis, T.; Woo, E. J.; Ball, R.

    2013-01-01

    Background We previously demonstrated that a general purpose text mining system, the Vaccine adverse event Text Mining (VaeTM) system, could be used to automatically classify reports of an-aphylaxis for post-marketing safety surveillance of vaccines. Objective To evaluate the ability of VaeTM to classify reports to the Vaccine Adverse Event Reporting System (VAERS) of possible Guillain-Barré Syndrome (GBS). Methods We used VaeTM to extract the key diagnostic features from the text of reports in VAERS. Then, we applied the Brighton Collaboration (BC) case definition for GBS, and an information retrieval strategy (i.e. the vector space model) to quantify the specific information that is included in the key features extracted by VaeTM and compared it with the encoded information that is already stored in VAERS as Medical Dictionary for Regulatory Activities (MedDRA) Preferred Terms (PTs). We also evaluated the contribution of the primary (diagnosis and cause of death) and secondary (second level diagnosis and symptoms) diagnostic VaeTM-based features to the total VaeTM-based information. Results MedDRA captured more information and better supported the classification of reports for GBS than VaeTM (AUC: 0.904 vs. 0.777); the lower performance of VaeTM is likely due to the lack of extraction by VaeTM of specific laboratory results that are included in the BC criteria for GBS. On the other hand, the VaeTM-based classification exhibited greater specificity than the MedDRA-based approach (94.96% vs. 87.65%). Most of the VaeTM-based information was contained in the secondary diagnostic features. Conclusion For GBS, clinical signs and symptoms alone are not sufficient to match MedDRA coding for purposes of case classification, but are preferred if specificity is the priority. PMID:23650490

  14. Automated Assessment of Patients' Self-Narratives for Posttraumatic Stress Disorder Screening Using Natural Language Processing and Text Mining.

    Science.gov (United States)

    He, Qiwei; Veldkamp, Bernard P; Glas, Cees A W; de Vries, Theo

    2017-03-01

    Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.

  15. A Digital Humanities Approach to the History of Science Eugenics Revisited in Hidden Debates by Means of Semantic Text Mining

    OpenAIRE

    Huijnen, Pim; Laan, Fons; de Rijke, Maarten; Pieters, Toine

    2014-01-01

    Comparative historical research on the the intensity, diversity and fluidity of public discourses has been severely hampered by the extraordinary task of manually gathering and processing large sets of opinionated data in news media in different countries. At most 50,000 documents have been systematically studied in a single comparative historical project in the subject area of heredity and eugenics. Digital techniques, like the text mining tools WAHSP and BILAND we have developed in two succ...

  16. Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the comparative toxicogenomics database.

    Directory of Open Access Journals (Sweden)

    Allan Peter Davis

    Full Text Available The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/ is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS, wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel. Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency.

  17. E-Cigarette Social Media Messages: A Text Mining Analysis of Marketing and Consumer Conversations on Twitter

    OpenAIRE

    Lazard, Allison J; Saffer, Adam J; Wilcox, Gary B; Chung, Arnold DongWoo; Mackert, Michael S; Bernhardt, Jay M

    2016-01-01

    Background As the use of electronic cigarettes (e-cigarettes) rises, social media likely influences public awareness and perception of this emerging tobacco product. Objective This study examined the public conversation on Twitter to determine overarching themes and insights for trending topics from commercial and consumer users. Methods Text mining uncovered key patterns and important topics for e-cigarettes on Twitter. SAS Text Miner 12.1 software (SAS Institute Inc) was used for descriptiv...

  18. Text Mining Effectively Scores and Ranks the Literature for Improving Chemical-Gene-Disease Curation at the Comparative Toxicogenomics Database

    Science.gov (United States)

    Johnson, Robin J.; Lay, Jean M.; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; Murphy, Cynthia Grondin; Mattingly, Carolyn J.

    2013-01-01

    The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS), wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel). Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency. PMID:23613709

  19. Studies on medicinal herbs for cognitive enhancement based on the text mining of Dongeuibogam and preliminary evaluation of its effects.

    Science.gov (United States)

    Pak, Malk Eun; Kim, Yu Ri; Kim, Ha Neui; Ahn, Sung Min; Shin, Hwa Kyoung; Baek, Jin Ung; Choi, Byung Tae

    2016-02-17

    In literature on Korean medicine, Dongeuibogam (Treasured Mirror of Eastern Medicine), published in 1613, represents the overall results of the traditional medicines of North-East Asia based on prior medicinal literature of this region. We utilized this medicinal literature by text mining to establish a list of candidate herbs for cognitive enhancement in the elderly and then performed an evaluation of their effects. Text mining was performed for selection of candidate herbs. Cell viability was determined in HT22 hippocampal cells and immunohistochemistry and behavioral analysis was performed in a kainic acid (KA) mice model in order to observe alterations of hippocampal cells and cognition. Twenty four herbs for cognitive enhancement in the elderly were selected by text mining of Dongeuibogam. In HT22 cells, pretreatment with 3 candidate herbs resulted in significantly reduced glutamate-induced cell death. Panax ginseng was the most neuroprotective herb against glutamate-induced cell death. In the hippocampus of a KA mice model, pretreatment with 11 candidate herbs resulted in suppression of caspase-3 expression. Treatment with 7 candidate herbs resulted in significantly enhanced expression levels of phosphorylated cAMP response element binding protein. Number of proliferated cells indicated by BrdU labeling was increased by treatment with 10 candidate herbs. Schisandra chinensis was the most effective herb against cell death and proliferation of progenitor cells and Rehmannia glutinosa in neuroprotection in the hippocampus of a KA mice model. In a KA mice model, we confirmed improved spatial and short memory by treatment with the 3 most effective candidate herbs and these recovered functions were involved in a higher number of newly formed neurons from progenitor cells in the hippocampus. These established herbs and their combinations identified by text-mining technique and evaluation for effectiveness may have value in further experimental and clinical

  20. Study on text mining algorithm for ultrasound examination of chronic liver diseases based on spectral clustering

    Science.gov (United States)

    Chang, Bingguo; Chen, Xiaofei

    2018-05-01

    Ultrasonography is an important examination for the diagnosis of chronic liver disease. The doctor gives the liver indicators and suggests the patient's condition according to the description of ultrasound report. With the rapid increase in the amount of data of ultrasound report, the workload of professional physician to manually distinguish ultrasound results significantly increases. In this paper, we use the spectral clustering method to cluster analysis of the description of the ultrasound report, and automatically generate the ultrasonic diagnostic diagnosis by machine learning. 110 groups ultrasound examination report of chronic liver disease were selected as test samples in this experiment, and the results were validated by spectral clustering and compared with k-means clustering algorithm. The results show that the accuracy of spectral clustering is 92.73%, which is higher than that of k-means clustering algorithm, which provides a powerful ultrasound-assisted diagnosis for patients with chronic liver disease.

  1. miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.

    Science.gov (United States)

    Gupta, Samir; Ross, Karen E; Tudor, Catalina O; Wu, Cathy H; Schmidt, Carl J; Vijay-Shanker, K

    2016-04-29

    MicroRNAs are increasingly being appreciated as critical players in human diseases, and questions concerning the role of microRNAs arise in many areas of biomedical research. There are several manually curated databases of microRNA-disease associations gathered from the biomedical literature; however, it is difficult for curators of these databases to keep up with the explosion of publications in the microRNA-disease field. Moreover, automated literature mining tools that assist manual curation of microRNA-disease associations currently capture only one microRNA property (expression) in the context of one disease (cancer). Thus, there is a clear need to develop more sophisticated automated literature mining tools that capture a variety of microRNA properties and relations in the context of multiple diseases to provide researchers with fast access to the most recent published information and to streamline and accelerate manual curation. We have developed miRiaD (microRNAs in association with Disease), a text-mining tool that automatically extracts associations between microRNAs and diseases from the literature. These associations are often not directly linked, and the intermediate relations are often highly informative for the biomedical researcher. Thus, miRiaD extracts the miR-disease pairs together with an explanation for their association. We also developed a procedure that assigns scores to sentences, marking their informativeness, based on the microRNA-disease relation observed within the sentence. miRiaD was applied to the entire Medline corpus, identifying 8301 PMIDs with miR-disease associations. These abstracts and the miR-disease associations are available for browsing at http://biotm.cis.udel.edu/miRiaD . We evaluated the recall and precision of miRiaD with respect to information of high interest to public microRNA-disease database curators (expression and target gene associations), obtaining a recall of 88.46-90.78. When we expanded the evaluation to

  2. Text mining-based in silico drug discovery in oral mucositis caused by high-dose cancer therapy.

    Science.gov (United States)

    Kirk, Jon; Shah, Nirav; Noll, Braxton; Stevens, Craig B; Lawler, Marshall; Mougeot, Farah B; Mougeot, Jean-Luc C

    2018-08-01

    Oral mucositis (OM) is a major dose-limiting side effect of chemotherapy and radiation used in cancer treatment. Due to the complex nature of OM, currently available drug-based treatments are of limited efficacy. Our objectives were (i) to determine genes and molecular pathways associated with OM and wound healing using computational tools and publicly available data and (ii) to identify drugs formulated for topical use targeting the relevant OM molecular pathways. OM and wound healing-associated genes were determined by text mining, and the intersection of the two gene sets was selected for gene ontology analysis using the GeneCodis program. Protein interaction network analysis was performed using STRING-db. Enriched gene sets belonging to the identified pathways were queried against the Drug-Gene Interaction database to find drug candidates for topical use in OM. Our analysis identified 447 genes common to both the "OM" and "wound healing" text mining concepts. Gene enrichment analysis yielded 20 genes representing six pathways and targetable by a total of 32 drugs which could possibly be formulated for topical application. A manual search on ClinicalTrials.gov confirmed no relevant pathway/drug candidate had been overlooked. Twenty-five of the 32 drugs can directly affect the PTGS2 (COX-2) pathway, the pathway that has been targeted in previous clinical trials with limited success. Drug discovery using in silico text mining and pathway analysis tools can facilitate the identification of existing drugs that have the potential of topical administration to improve OM treatment.

  3. PubMed-EX: a web browser extension to enhance PubMed search with text mining features.

    Science.gov (United States)

    Tsai, Richard Tzong-Han; Dai, Hong-Jie; Lai, Po-Ting; Huang, Chi-Hsin

    2009-11-15

    PubMed-EX is a browser extension that marks up PubMed search results with additional text-mining information. PubMed-EX's page mark-up, which includes section categorization and gene/disease and relation mark-up, can help researchers to quickly focus on key terms and provide additional information on them. All text processing is performed server-side, freeing up user resources. PubMed-EX is freely available at http://bws.iis.sinica.edu.tw/PubMed-EX and http://iisr.cse.yzu.edu.tw:8000/PubMed-EX/.

  4. Combining QSAR Modeling and Text-Mining Techniques to Link Chemical Structures and Carcinogenic Modes of Action.

    Science.gov (United States)

    Papamokos, George; Silins, Ilona

    2016-01-01

    There is an increasing need for new reliable non-animal based methods to predict and test toxicity of chemicals. Quantitative structure-activity relationship (QSAR), a computer-based method linking chemical structures with biological activities, is used in predictive toxicology. In this study, we tested the approach to combine QSAR data with literature profiles of carcinogenic modes of action automatically generated by a text-mining tool. The aim was to generate data patterns to identify associations between chemical structures and biological mechanisms related to carcinogenesis. Using these two methods, individually and combined, we evaluated 96 rat carcinogens of the hematopoietic system, liver, lung, and skin. We found that skin and lung rat carcinogens were mainly mutagenic, while the group of carcinogens affecting the hematopoietic system and the liver also included a large proportion of non-mutagens. The automatic literature analysis showed that mutagenicity was a frequently reported endpoint in the literature of these carcinogens, however, less common endpoints such as immunosuppression and hormonal receptor-mediated effects were also found in connection with some of the carcinogens, results of potential importance for certain target organs. The combined approach, using QSAR and text-mining techniques, could be useful for identifying more detailed information on biological mechanisms and the relation with chemical structures. The method can be particularly useful in increasing the understanding of structure and activity relationships for non-mutagens.

  5. Life priorities in the HIV-positive Asians: a text-mining analysis in young vs. old generation.

    Science.gov (United States)

    Chen, Wei-Ti; Barbour, Russell

    2017-04-01

    HIV/AIDS is one of the most urgent and challenging public health issues, especially since it is now considered a chronic disease. In this project, we used text mining techniques to extract meaningful words and word patterns from 45 transcribed in-depth interviews of people living with HIV/AIDS (PLWHA) conducted in Taipei, Beijing, Shanghai, and San Francisco from 2006 to 2013. Text mining analysis can predict whether an emerging field will become a long-lasting source of academic interest or whether it is simply a passing source of interest that will soon disappear. The data were analyzed by age group (45 and older vs. 44 and younger). The highest ranking fragments in the order of frequency were: "care", "daughter", "disease", "family", "HIV", "hospital", "husband", "medicines", "money", "people", "son", "tell/disclosure", "thought", "want", and "years". Participants in the 44-year-old and younger group were focused mainly on disease disclosure, their families, and their financial condition. In older PLWHA, social supports were one of the main concerns. In this study, we learned that different age groups perceive the disease differently. Therefore, when designing intervention, researchers should consider to tailor an intervention to a specific population and to help PLWHA achieve a better quality of life. Promoting self-management can be an effective strategy for every encounter with HIV-positive individuals.

  6. A preliminary approach to creating an overview of lactoferrin multi-functionality utilizing a text mining method.

    Science.gov (United States)

    Shimazaki, Kei-ichi; Kushida, Tatsuya

    2010-06-01

    Lactoferrin is a multi-functional metal-binding glycoprotein that exhibits many biological functions of interest to many researchers from the fields of clinical medicine, dentistry, pharmacology, veterinary medicine, nutrition and milk science. To date, a number of academic reports concerning the biological activities of lactoferrin have been published and are easily accessible through public data repositories. However, as the literature is expanding daily, this presents challenges in understanding the larger picture of lactoferrin function and mechanisms. In order to overcome the "analysis paralysis" associated with lactoferrin information, we attempted to apply a text mining method to the accumulated lactoferrin literature. To this end, we used the information extraction system GENPAC (provided by Nalapro Technologies Inc., Tokyo). This information extraction system uses natural language processing and text mining technology. This system analyzes the sentences and titles from abstracts stored in the PubMed database, and can automatically extract binary relations that consist of interactions between genes/proteins, chemicals and diseases/functions. We expect that such information visualization analysis will be useful in determining novel relationships among a multitude of lactoferrin functions and mechanisms. We have demonstrated the utilization of this method to find pathways of lactoferrin participation in neovascularization, Helicobacter pylori attack on gastric mucosa, atopic dermatitis and lipid metabolism.

  7. Combining QSAR Modeling and Text-Mining Techniques to Link Chemical Structures and Carcinogenic Modes of Action

    Science.gov (United States)

    Papamokos, George; Silins, Ilona

    2016-01-01

    There is an increasing need for new reliable non-animal based methods to predict and test toxicity of chemicals. Quantitative structure-activity relationship (QSAR), a computer-based method linking chemical structures with biological activities, is used in predictive toxicology. In this study, we tested the approach to combine QSAR data with literature profiles of carcinogenic modes of action automatically generated by a text-mining tool. The aim was to generate data patterns to identify associations between chemical structures and biological mechanisms related to carcinogenesis. Using these two methods, individually and combined, we evaluated 96 rat carcinogens of the hematopoietic system, liver, lung, and skin. We found that skin and lung rat carcinogens were mainly mutagenic, while the group of carcinogens affecting the hematopoietic system and the liver also included a large proportion of non-mutagens. The automatic literature analysis showed that mutagenicity was a frequently reported endpoint in the literature of these carcinogens, however, less common endpoints such as immunosuppression and hormonal receptor-mediated effects were also found in connection with some of the carcinogens, results of potential importance for certain target organs. The combined approach, using QSAR and text-mining techniques, could be useful for identifying more detailed information on biological mechanisms and the relation with chemical structures. The method can be particularly useful in increasing the understanding of structure and activity relationships for non-mutagens. PMID:27625608

  8. Assisting IAEA Member States to Strengthen Regulatory Control, Particularly in the Medical Area

    International Nuclear Information System (INIS)

    Johnston, P.

    2016-01-01

    As per its Statue and Mandate, IAEA is developing Safety Standards and is also providing assistance for their application in Member States. One target and very large audience of this programme is the community of national regulatory bodies for radiation safety, expected to be established in all 168 Member States. Ionizing radiation is being used throughout the world in medical practices and medical exposure is the most significant manmade source of exposure to the population from ionizing radiation. Radiation accidents involving medical uses have accounted for more injuries and early acute health effects than any other type of radiation accident, including accidents at nuclear facilities. With the constant emerging of new technologies using ionizing radiation for medical diagnostic and treatment, there are on-going challenges for Regulatory bodies. The presentation will highlight some figures related to the medical exposure worldwide, and then it will introduce the main safety standards and other publications developed specifically for Regulatory Bodies and focusing on medical practices. It will also highlight the most important and recent mechanisms (tools, peer reviews and advisory services, training courses, networks) that the Agency is offering to its Member States in order to cope with the main challenges worldwide, contributing thus to the efficiency and effectiveness of the regulatory oversight of medical facilities and activities. (author)

  9. U.S. Nuclear Regulatory Commission nuclear safety assistance to the CEE and NIS countries

    International Nuclear Information System (INIS)

    Blaha, J.

    2001-01-01

    NRC participates in bilateral and multilateral efforts to strengthen the regulatory authorities of countries in which Soviet design NPPs are operated. Countries involved are the New Independent States of the Soviet Union (Armenia, Kazakhstan, Russia and Ukraine) and of Central and Eastern Europe (Bulgaria, Czech Republic, Hungary, Lithuania and Slovak Republic). NRC's goal is to see that its counterparts receive the basic tools, knowledge and understanding needed to exercise effective regulatory oversight, consistent with internationally accepted norms and standards. The bilateral assistance started in 1991. $44 mill. are provided to the countries. The multilateral activities NRC participates in include: H-7 Nuclear Safety Working Group, EBRD - Administered Nuclear Safety Account and Chernobyl Sarcophagus Fund and IAEA

  10. Development and testing of a text-mining approach to analyse patients' comments on their experiences of colorectal cancer care.

    Science.gov (United States)

    Wagland, Richard; Recio-Saucedo, Alejandra; Simon, Michael; Bracher, Michael; Hunt, Katherine; Foster, Claire; Downing, Amy; Glaser, Adam; Corner, Jessica

    2016-08-01

    Quality of cancer care may greatly impact on patients' health-related quality of life (HRQoL). Free-text responses to patient-reported outcome measures (PROMs) provide rich data but analysis is time and resource-intensive. This study developed and tested a learning-based text-mining approach to facilitate analysis of patients' experiences of care and develop an explanatory model illustrating impact on HRQoL. Respondents to a population-based survey of colorectal cancer survivors provided free-text comments regarding their experience of living with and beyond cancer. An existing coding framework was tested and adapted, which informed learning-based text mining of the data. Machine-learning algorithms were trained to identify comments relating to patients' specific experiences of service quality, which were verified by manual qualitative analysis. Comparisons between coded retrieved comments and a HRQoL measure (EQ5D) were explored. The survey response rate was 63.3% (21 802/34 467), of which 25.8% (n=5634) participants provided free-text comments. Of retrieved comments on experiences of care (n=1688), over half (n=1045, 62%) described positive care experiences. Most negative experiences concerned a lack of post-treatment care (n=191, 11% of retrieved comments) and insufficient information concerning self-management strategies (n=135, 8%) or treatment side effects (n=160, 9%). Associations existed between HRQoL scores and coded algorithm-retrieved comments. Analysis indicated that the mechanism by which service quality impacted on HRQoL was the extent to which services prevented or alleviated challenges associated with disease and treatment burdens. Learning-based text mining techniques were found useful and practical tools to identify specific free-text comments within a large dataset, facilitating resource-efficient qualitative analysis. This method should be considered for future PROM analysis to inform policy and practice. Study findings indicated that

  11. A Study on Environmental Research Trends Using Text-Mining Method - Focus on Spatial information and ICT -

    Science.gov (United States)

    Lee, M. J.; Oh, K. Y.; Joung-ho, L.

    2016-12-01

    Recently there are many research about analysing the interaction between entities by text-mining analysis in various fields. In this paper, we aimed to quantitatively analyse research-trends in the area of environmental research relating either spatial information or ICT (Information and Communications Technology) by Text-mining analysis. To do this, we applied low-dimensional embedding method, clustering analysis, and association rule to find meaningful associative patterns of key words frequently appeared in the articles. As the authors suppose that KCI (Korea Citation Index) articles reflect academic demands, total 1228 KCI articles that have been published from 1996 to 2015 were reviewed and analysed by Text-mining method. First, we derived KCI articles from NDSL(National Discovery for Science Leaders) site. And then we pre-processed their key-words elected from abstract and then classified those in separable sectors. We investigated the appearance rates and association rule of key-words for articles in the two fields: spatial-information and ICT. In order to detect historic trends, analysis was conducted separately for the four periods: 1996-2000, 2001-2005, 2006-2010, 2011-2015. These analysis were conducted with the usage of R-software. As a result, we conformed that environmental research relating spatial information mainly focused upon such fields as `GIS(35%)', `Remote-Sensing(25%)', `environmental theme map(15.7%)'. Next, `ICT technology(23.6%)', `ICT service(5.4%)', `mobile(24%)', `big data(10%)', `AI(7%)' are primarily emerging from environmental research relating ICT. Thus, from the analysis results, this paper asserts that research trends and academic progresses are well-structured to review recent spatial information and ICT technology and the outcomes of the analysis can be an adequate guidelines to establish environment policies and strategies. KEY WORDS: Big data, Test-mining, Environmental research, Spatial-information, ICT Acknowledgements: The

  12. Visualization and Analysis of a Cardio Vascular Diseaseand MUPP1-related Biological Network combining Text Mining and Data Warehouse Approaches

    Directory of Open Access Journals (Sweden)

    Sommer Björn

    2010-03-01

    Full Text Available Detailed investigation of socially important diseases with modern experimental methods has resulted in the generation of large volume of valuable data. However, analysis and interpretation of this data needs application of efficient computational techniques and systems biology approaches. In particular, the techniques allowing the reconstruction of associative networks of various biological objects and events can be useful. In this publication, the combination of different techniques to create such a network associated with an abstract cell environment is discussed in order to gain insights into the functional as well as spatial interrelationships. It is shown that experimentally gained knowledge enriched with data warehouse content and text mining data can be used for the reconstruction and localization of a cardiovascular disease developing network beginning with MUPP1/MPDZ (multi-PDZ domain protein.

  13. E-Cigarette Social Media Messages: A Text Mining Analysis of Marketing and Consumer Conversations on Twitter

    Science.gov (United States)

    2016-01-01

    Background As the use of electronic cigarettes (e-cigarettes) rises, social media likely influences public awareness and perception of this emerging tobacco product. Objective This study examined the public conversation on Twitter to determine overarching themes and insights for trending topics from commercial and consumer users. Methods Text mining uncovered key patterns and important topics for e-cigarettes on Twitter. SAS Text Miner 12.1 software (SAS Institute Inc) was used for descriptive text mining to reveal the primary topics from tweets collected from March 24, 2015, to July 3, 2015, using a Python script in conjunction with Twitter’s streaming application programming interface. A total of 18 keywords related to e-cigarettes were used and resulted in a total of 872,544 tweets that were sorted into overarching themes through a text topic node for tweets (126,127) and retweets (114,451) that represented more than 1% of the conversation. Results While some of the final themes were marketing-focused, many topics represented diverse proponent and user conversations that included discussion of policies, personal experiences, and the differentiation of e-cigarettes from traditional tobacco, often by pointing to the lack of evidence for the harm or risks of e-cigarettes or taking the position that e-cigarettes should be promoted as smoking cessation devices. Conclusions These findings reveal that unique, large-scale public conversations are occurring on Twitter alongside e-cigarette advertising and promotion. Proponents and users are turning to social media to share knowledge, experience, and questions about e-cigarette use. Future research should focus on these unique conversations to understand how they influence attitudes towards and use of e-cigarettes. PMID:27956376

  14. E-Cigarette Social Media Messages: A Text Mining Analysis of Marketing and Consumer Conversations on Twitter.

    Science.gov (United States)

    Lazard, Allison J; Saffer, Adam J; Wilcox, Gary B; Chung, Arnold DongWoo; Mackert, Michael S; Bernhardt, Jay M

    2016-12-12

    As the use of electronic cigarettes (e-cigarettes) rises, social media likely influences public awareness and perception of this emerging tobacco product. This study examined the public conversation on Twitter to determine overarching themes and insights for trending topics from commercial and consumer users. Text mining uncovered key patterns and important topics for e-cigarettes on Twitter. SAS Text Miner 12.1 software (SAS Institute Inc) was used for descriptive text mining to reveal the primary topics from tweets collected from March 24, 2015, to July 3, 2015, using a Python script in conjunction with Twitter's streaming application programming interface. A total of 18 keywords related to e-cigarettes were used and resulted in a total of 872,544 tweets that were sorted into overarching themes through a text topic node for tweets (126,127) and retweets (114,451) that represented more than 1% of the conversation. While some of the final themes were marketing-focused, many topics represented diverse proponent and user conversations that included discussion of policies, personal experiences, and the differentiation of e-cigarettes from traditional tobacco, often by pointing to the lack of evidence for the harm or risks of e-cigarettes or taking the position that e-cigarettes should be promoted as smoking cessation devices. These findings reveal that unique, large-scale public conversations are occurring on Twitter alongside e-cigarette advertising and promotion. Proponents and users are turning to social media to share knowledge, experience, and questions about e-cigarette use. Future research should focus on these unique conversations to understand how they influence attitudes towards and use of e-cigarettes. ©Allison J Lazard, Adam J Saffer, Gary B Wilcox, Arnold DongWoo Chung, Michael S Mackert, Jay M Bernhardt. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 12.12.2016.

  15. Trends in HIV Terminology: Text Mining and Data Visualization Assessment of International AIDS Conference Abstracts Over 25 Years.

    Science.gov (United States)

    Dancy-Scott, Nicole; Dutcher, Gale A; Keselman, Alla; Hochstein, Colette; Copty, Christina; Ben-Senia, Diane; Rajan, Sampada; Asencio, Maria Guadalupe; Choi, Jason Jongwon

    2018-05-04

    The language encompassing health conditions can also influence behaviors that affect health outcomes. Few published quantitative studies have been conducted that evaluate HIV-related terminology changes over time. To expand this research, this study included an analysis of a dataset of abstracts presented at the International AIDS Conference (IAC) from 1989 to 2014. These abstracts reflect the global response to HIV over 25 years. Two powerful methodologies were used to evaluate the dataset: text mining to convert the unstructured information into structured data for analysis and data visualization to represent the data visually to assess trends. The purpose of this project was to evaluate the evolving use of HIV-related language in abstracts presented at the IAC from 1989 to 2014. Over 80,000 abstracts were obtained from the International AIDS Society and imported into a Microsoft SQL Server database for data processing and text mining analyses. A text mining module within the KNIME Analytics Platform, an open source software, was then used to mine the partially processed data to create a terminology corpus of key HIV terms. Subject matter experts grouped the terms into categories. Tableau, a data visualization software, was used to visualize the frequency metrics associated with the terms as line graphs and word clouds. The visualized dashboards were reviewed to discern changes in terminology use across IAC years. The major findings identify trends in HIV-related terminology over 25 years. The term "AIDS epidemic" was dominantly used from 1989 to 1991 and then declined in use. In contrast, use of the term "HIV epidemic" increased through 2014. Beginning in the mid-1990s, the term "treatment experienced" appeared with increasing frequency in the abstracts. Use of terms identifying individuals as "carriers or victims" of HIV rarely appeared after 2008. Use of the terms "HIV positive" and "HIV infected" peaked in the early-1990s and then declined in use. The terms

  16. Text mining and natural language processing approaches for automatic categorization of lay requests to web-based expert forums.

    Science.gov (United States)

    Himmel, Wolfgang; Reincke, Ulrich; Michelmann, Hans Wilhelm

    2009-07-22

    Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called "ask the doctor" services. To automatically classify lay requests to an Internet medical expert forum using a combination of different text-mining strategies. We first manually classified a sample of 988 requests directed to a involuntary childlessness forum on the German website "Rund ums Baby" ("Everything about Babies") into one or more of 38 categories belonging to two dimensions ("subject matter" and "expectations"). After creating start and synonym lists, we calculated the average Cramer's V statistic for the association of each word with each category. We also used principle component analysis and singular value decomposition as further text-mining strategies. With these measures we trained regression models and determined, on the basis of best regression models, for any request the probability of belonging to each of the 38 different categories, with a cutoff of 50%. Recall and precision of a test sample were calculated as a measure of quality for the automatic classification. According to the manual classification of 988 documents, 102 (10%) documents fell into the category "in vitro fertilization (IVF)," 81 (8%) into the category "ovulation," 79 (8%) into "cycle," and 57 (6%) into "semen analysis." These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as "general information" and 351 (36%) as a wish for "treatment recommendations." The generation of indicator variables based on the chi-square analysis and Cramer's V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other approaches, 100% precision and 100% recall were realized in 18 (47%) out of the 38

  17. Construction of phosphorylation interaction networks by text mining of full-length articles using the eFIP system.

    Science.gov (United States)

    Tudor, Catalina O; Ross, Karen E; Li, Gang; Vijay-Shanker, K; Wu, Cathy H; Arighi, Cecilia N

    2015-01-01

    Protein phosphorylation is a reversible post-translational modification where a protein kinase adds a phosphate group to a protein, potentially regulating its function, localization and/or activity. Phosphorylation can affect protein-protein interactions (PPIs), abolishing interaction with previous binding partners or enabling new interactions. Extracting phosphorylation information coupled with PPI information from the scientific literature will facilitate the creation of phosphorylation interaction networks of kinases, substrates and interacting partners, toward knowledge discovery of functional outcomes of protein phosphorylation. Increasingly, PPI databases are interested in capturing the phosphorylation state of interacting partners. We have previously developed the eFIP (Extracting Functional Impact of Phosphorylation) text mining system, which identifies phosphorylated proteins and phosphorylation-dependent PPIs. In this work, we present several enhancements for the eFIP system: (i) text mining for full-length articles from the PubMed Central open-access collection; (ii) the integration of the RLIMS-P 2.0 system for the extraction of phosphorylation events with kinase, substrate and site information; (iii) the extension of the PPI module with new trigger words/phrases describing interactions and (iv) the addition of the iSimp tool for sentence simplification to aid in the matching of syntactic patterns. We enhance the website functionality to: (i) support searches based on protein roles (kinases, substrates, interacting partners) or using keywords; (ii) link protein entities to their corresponding UniProt identifiers if mapped and (iii) support visual exploration of phosphorylation interaction networks using Cytoscape. The evaluation of eFIP on full-length articles achieved 92.4% precision, 76.5% recall and 83.7% F-measure on 100 article sections. To demonstrate eFIP for knowledge extraction and discovery, we constructed phosphorylation-dependent interaction

  18. KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

    Directory of Open Access Journals (Sweden)

    Schomburg Dietmar

    2010-07-01

    Full Text Available Abstract Background The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed. Description Here we present a text mining algorithm for the extraction of kinetic information such as KM, Ki, kcat etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (KM, Ki, kcat, kcat/KM, Vmax, IC50, S0.5, Kd, Ka, t1/2, pI, nH, specific activity, Vmax/KM from about 17 million PubMed abstracts and combine them with other data in the abstract. A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched. The results were stored in a database and are available as "KID the KInetic Database" via the internet. Conclusions The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases. The database is available at http://kid.tu-bs.de. The source code of the algorithm is provided under the GNU General Public

  19. Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

    Directory of Open Access Journals (Sweden)

    André SANTOS

    2012-07-01

    Full Text Available Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents.

  20. Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

    Directory of Open Access Journals (Sweden)

    Anália LOURENÇO

    2013-07-01

    Full Text Available Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents.

  1. Clustering box office movie with Partition Around Medoids (PAM) Algorithm based on Text Mining of Indonesian subtitle

    Science.gov (United States)

    Alfarizy, A. D.; Indahwati; Sartono, B.

    2017-03-01

    Indonesia is the largest Hollywood movie industry target market in Southeast Asia in 2015. Hollywood movies distributed in Indonesia targeted people in all range of ages including children. Low awareness of guiding children while watching movies make them could watch any rated films even the unsuitable ones for their ages. Even after being translated into Bahasa and passed the censorship phase, words that uncomfortable for children to watch still exist. The purpose of this research is to cluster box office Hollywood movies based on Indonesian subtitle, revenue, IMDb user rating and genres as one of the reference for adults to choose right movies for their children to watch. Text mining is used to extract words from the subtitles and count the frequency for three group of words (bad words, sexual words and terror words), while Partition Around Medoids (PAM) Algorithm with Gower similarity coefficient as proximity matrix is used as clustering method. We clustered 624 movies from 2006 until first half of 2016 from IMDb. Cluster with highest silhouette coefficient value (0.36) is the one with 5 clusters. Animation, Adventure and Comedy movies with high revenue like in cluster 5 is recommended for children to watch, while Comedy movies with high revenue like in cluster 4 should be avoided to watch.

  2. Study on Students' Impression Data in Practical Training Using Text Mining Method-Analysis of Considerable Communication.

    Science.gov (United States)

    Teramachi, Hitomi; Sugita, Ikuto; Ino, Yoko; Hayashi, Yuta; Yoshida, Aki; Otsubo, Manami; Ueno, Anri; Katsuno, Hayato; Noguchi, Yoshihiro; Iguchi, Kazuhiro; Tachi, Tomoya

    2017-09-01

    We analyzed impression data and the scale of communication skills of students using text mining method to clarify which area a student was conscious of in communication in practical training. The results revealed that students tended to be conscious of the difference between practical hospital training and practical pharmacy training. In practical hospital training, specific expressions denoting relationships were "patient-visit", "counseling-conduct", "patient-counseling", and "patient-talk". In practical pharmacy training, specific expressions denoting relationships were "patient counseling-conduct", "story-listen", "patient-many", and "patient-visit". In practical hospital training, the word "patient" was connected to many words suggesting that students were conscious of a patient-centered communication. In practical pharmacy training, words such as "patient counseling", "patient", and "explanation" were placed in center and connected with many other words and there was an independent relationship between "communication" and "accept". In conclusion, it was suggested that students attempted active patient-centered communication in practical hospital training, while they were conscious of listening closely in patient counseling in practical pharmacy training.

  3. pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.

    Science.gov (United States)

    Rani, Jyoti; Shah, A B Rauf; Ramachandran, Srinivasan

    2015-10-01

    The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, 'Evolving role of diabetes educators', 'Cancer risk assessment' and 'Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.rproject. org/web/packages/pubmed.mineR.

  4. Text mining electronic hospital records to automatically classify admissions against disease: Measuring the impact of linking data sources.

    Science.gov (United States)

    Kocbek, Simon; Cavedon, Lawrence; Martinez, David; Bain, Christopher; Manus, Chris Mac; Haffari, Gholamreza; Zukerman, Ingrid; Verspoor, Karin

    2016-12-01

    Text and data mining play an important role in obtaining insights from Health and Hospital Information Systems. This paper presents a text mining system for detecting admissions marked as positive for several diseases: Lung Cancer, Breast Cancer, Colon Cancer, Secondary Malignant Neoplasm of Respiratory and Digestive Organs, Multiple Myeloma and Malignant Plasma Cell Neoplasms, Pneumonia, and Pulmonary Embolism. We specifically examine the effect of linking multiple data sources on text classification performance. Support Vector Machine classifiers are built for eight data source combinations, and evaluated using the metrics of Precision, Recall and F-Score. Sub-sampling techniques are used to address unbalanced datasets of medical records. We use radiology reports as an initial data source and add other sources, such as pathology reports and patient and hospital admission data, in order to assess the research question regarding the impact of the value of multiple data sources. Statistical significance is measured using the Wilcoxon signed-rank test. A second set of experiments explores aspects of the system in greater depth, focusing on Lung Cancer. We explore the impact of feature selection; analyse the learning curve; examine the effect of restricting admissions to only those containing reports from all data sources; and examine the impact of reducing the sub-sampling. These experiments provide better understanding of how to best apply text classification in the context of imbalanced data of variable completeness. Radiology questions plus patient and hospital admission data contribute valuable information for detecting most of the diseases, significantly improving performance when added to radiology reports alone or to the combination of radiology and pathology reports. Overall, linking data sources significantly improved classification performance for all the diseases examined. However, there is no single approach that suits all scenarios; the choice of the

  5. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    K.M. Hettne (Kristina); J. Boorsma (Jeffrey); D.A.M. van Dartel (Dorien A M); J.J. Goeman (Jelle); E.C. de Jong (Esther); A.H. Piersma (Aldert); R.H. Stierum (Rob); J. Kleinjans (Jos); J.A. Kors (Jan)

    2013-01-01

    textabstractBackground: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with

  6. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    Hettne, K.M.; Boorsma, A.; Dartel, D.A. van; Goeman, J.J.; Jong, E. de; Piersma, A.H.; Stierum, R.H.; Kleinjans, J.C.; Kors, J.A.

    2013-01-01

    BACKGROUND: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set

  7. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    NARCIS (Netherlands)

    Hettne, K.M.; Boorsma, A.; Dartel, van D.A.M.; Goeman, J.J.; Jong, de E.; Piersma, A.H.; Stierum, R.H.; Kleinjans, J.C.; Kors, J.A.

    2013-01-01

    Background: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set

  8. Ask and Ye Shall Receive? Automated Text Mining of Michigan Capital Facility Finance Bond Election Proposals to Identify Which Topics Are Associated with Bond Passage and Voter Turnout

    Science.gov (United States)

    Bowers, Alex J.; Chen, Jingjing

    2015-01-01

    The purpose of this study is to bring together recent innovations in the research literature around school district capital facility finance, municipal bond elections, statistical models of conditional time-varying outcomes, and data mining algorithms for automated text mining of election ballot proposals to examine the factors that influence the…

  9. Text mining of rheumatoid arthritis and diabetes mellitus to understand the mechanisms of Chinese medicine in different diseases with same treatment.

    Science.gov (United States)

    Zhao, Ning; Zheng, Guang; Li, Jian; Zhao, Hong-Yan; Lu, Cheng; Jiang, Miao; Zhang, Chi; Guo, Hong-Tao; Lu, Ai-Ping

    2018-01-09

    To identify the commonalities between rheumatoid arthritis (RA) and diabetes mellitus (DM) to understand the mechanisms of Chinese medicine (CM) in different diseases with the same treatment. A text mining approach was adopted to analyze the commonalities between RA and DM according to CM and biological elements. The major commonalities were subsequently verifified in RA and DM rat models, in which herbal formula for the treatment of both RA and DM identifified via text mining was used as the intervention. Similarities were identifified between RA and DM regarding the CM approach used for diagnosis and treatment, as well as the networks of biological activities affected by each disease, including the involvement of adhesion molecules, oxidative stress, cytokines, T-lymphocytes, apoptosis, and inflfl ammation. The Ramulus Cinnamomi-Radix Paeoniae Alba-Rhizoma Anemarrhenae is an herbal combination used to treat RA and DM. This formula demonstrated similar effects on oxidative stress and inflfl ammation in rats with collagen-induced arthritis, which supports the text mining results regarding the commonalities between RA and DM. Commonalities between the biological activities involved in RA and DM were identifified through text mining, and both RA and DM might be responsive to the same intervention at a specifific stage.

  10. Evaluating a Bilingual Text-Mining System with a Taxonomy of Key Words and Hierarchical Visualization for Understanding Learner-Generated Text

    Science.gov (United States)

    Kong, Siu Cheung; Li, Ping; Song, Yanjie

    2018-01-01

    This study evaluated a bilingual text-mining system, which incorporated a bilingual taxonomy of key words and provided hierarchical visualization, for understanding learner-generated text in the learning management systems through automatic identification and counting of matching key words. A class of 27 in-service teachers studied a course…

  11. Examining Thematic Similarity, Difference, and Membership in Three Online Mental Health Communities from Reddit: A Text Mining and Visualization Approach.

    Science.gov (United States)

    Park, Albert; Conway, Mike; Chen, Annie T

    2018-01-01

    Social media, including online health communities, have become popular platforms for individuals to discuss health challenges and exchange social support with others. These platforms can provide support for individuals who are concerned about social stigma and discrimination associated with their illness. Although mental health conditions can share similar symptoms and even co-occur, the extent to which discussion topics in online mental health communities are similar, different, or overlapping is unknown. Discovering the topical similarities and differences could potentially inform the design of related mental health communities and patient education programs. This study employs text mining, qualitative analysis, and visualization techniques to compare discussion topics in publicly accessible online mental health communities for three conditions: Anxiety, Depression and Post-Traumatic Stress Disorder. First, online discussion content for the three conditions was collected from three Reddit communities (r/Anxiety, r/Depression, and r/PTSD). Second, content was pre-processed, and then clustered using the k -means algorithm to identify themes that were commonly discussed by members. Third, we qualitatively examined the common themes to better understand them, as well as their similarities and differences. Fourth, we employed multiple visualization techniques to form a deeper understanding of the relationships among the identified themes for the three mental health conditions. The three mental health communities shared four themes: sharing of positive emotion, gratitude for receiving emotional support, and sleep- and work-related issues. Depression clusters tended to focus on self-expressed contextual aspects of depression, whereas the Anxiety Disorders and Post-Traumatic Stress Disorder clusters addressed more treatment- and medication-related issues. Visualizations showed that discussion topics from the Anxiety Disorders and Post-Traumatic Stress Disorder subreddits

  12. What Online Communities Can Tell Us About Electronic Cigarettes and Hookah Use: A Study Using Text Mining and Visualization Techniques

    OpenAIRE

    Chen, Annie T; Zhu, Shu-Hong; Conway, Mike

    2015-01-01

    © 2015 Journal of Medical Internet Research. Background: The rise in popularity of electronic cigarettes (e-cigarettes) and hookah over recent years has been accompanied by some confusion and uncertainty regarding the development of an appropriate regulatory response towards these emerging products. Mining online discussion content can lead to insights into people's experiences, which can in turn further our knowledge of how to address potential health implications. In this work, we take a no...

  13. GIS-assisted spatial analysis for urban regulatory detailed planning: designer's dimension in the Chinese code system

    Science.gov (United States)

    Yu, Yang; Zeng, Zheng

    2009-10-01

    By discussing the causes behind the high amendments ratio in the implementation of urban regulatory detailed plans in China despite its law-ensured status, the study aims to reconcile conflict between the legal authority of regulatory detailed planning and the insufficient scientific support in its decision-making and compilation by introducing into the process spatial analysis based on GIS technology and 3D modeling thus present a more scientific and flexible approach to regulatory detailed planning in China. The study first points out that the current compilation process of urban regulatory detailed plan in China employs mainly an empirical approach which renders it constantly subjected to amendments; the study then discusses the need and current utilization of GIS in the Chinese system and proposes the framework of a GIS-assisted 3D spatial analysis process from the designer's perspective which can be regarded as an alternating processes between the descriptive codes and physical design in the compilation of regulatory detailed planning. With a case study of the processes and results from the application of the framework, the paper concludes that the proposed framework can be an effective instrument which provides more rationality, flexibility and thus more efficiency to the compilation and decision-making process of urban regulatory detailed plan in China.

  14. Application of Text Mining to Extract Hotel Attributes and Construct Perceptual Map of Five Star Hotels from Online Review: Study of Jakarta and Singapore Five-Star Hotels

    Directory of Open Access Journals (Sweden)

    Arga Hananto

    2015-12-01

    Full Text Available The use of post-purchase online consumer review in hotel attributes study was still scarce in the literature. Arguably, post purchase online review data would gain more accurate attributes thatconsumers actually consider in their purchase decision. This study aims to extract attributes from two samples of five-star hotel reviews (Jakarta and Singapore with text mining methodology. In addition,this study also aims to describe positioning of five-star hotels in Jakarta and Singapore based on the extracted attributes using Correspondence Analysis. This study finds that reviewers of five star hotels in both cities mentioned similar attributes such as service, staff, club, location, pool and food. Attributes derived from text mining seem to be viable input to build fairly accurate positioning map of hotels. This study has demonstrated the viability of online review as a source of data for hotel attribute and positioning studies.

  15. Finding novel relationships with integrated gene-gene association network analysis of Synechocystis sp. PCC 6803 using species-independent text-mining.

    Science.gov (United States)

    Kreula, Sanna M; Kaewphan, Suwisa; Ginter, Filip; Jones, Patrik R

    2018-01-01

    The increasing move towards open access full-text scientific literature enhances our ability to utilize advanced text-mining methods to construct information-rich networks that no human will be able to grasp simply from 'reading the literature'. The utility of text-mining for well-studied species is obvious though the utility for less studied species, or those with no prior track-record at all, is not clear. Here we present a concept for how advanced text-mining can be used to create information-rich networks even for less well studied species and apply it to generate an open-access gene-gene association network resource for Synechocystis sp. PCC 6803, a representative model organism for cyanobacteria and first case-study for the methodology. By merging the text-mining network with networks generated from species-specific experimental data, network integration was used to enhance the accuracy of predicting novel interactions that are biologically relevant. A rule-based algorithm (filter) was constructed in order to automate the search for novel candidate genes with a high degree of likely association to known target genes by (1) ignoring established relationships from the existing literature, as they are already 'known', and (2) demanding multiple independent evidences for every novel and potentially relevant relationship. Using selected case studies, we demonstrate the utility of the network resource and filter to ( i ) discover novel candidate associations between different genes or proteins in the network, and ( ii ) rapidly evaluate the potential role of any one particular gene or protein. The full network is provided as an open-source resource.

  16. Exploratory analysis of textual data from the Mother and Child Handbook using the text-mining method: Relationships with maternal traits and post-partum depression.

    Science.gov (United States)

    Matsuda, Yoshio; Manaka, Tomoko; Kobayashi, Makiko; Sato, Shuhei; Ohwada, Michitaka

    2016-06-01

    The aim of the present study was to examine the possibility of screening apprehensive pregnant women and mothers at risk for post-partum depression from an analysis of the textual data in the Mother and Child Handbook by using the text-mining method. Uncomplicated pregnant women (n = 58) were divided into two groups according to State-Trait Anxiety Inventory grade (high trait [group I, n = 21] and low trait [group II, n = 37]) or Edinburgh Postnatal Depression Scale score (high score [group III, n = 15] and low score [group IV, n = 43]). An exploratory analysis of the textual data from the Maternal and Child Handbook was conducted using the text-mining method with the Word Miner software program. A comparison of the 'structure elements' was made between the two groups. The number of structure elements extracted by separated words from text data was 20 004 and the number of structure elements with a threshold of 2 or more as an initial value was 1168. Fifteen key words related to maternal anxiety, and six key words related to post-partum depression were extracted. The text-mining method is useful for the exploratory analysis of textual data obtained from pregnant woman, and this screening method has been suggested to be useful for apprehensive pregnant women and mothers at risk for post-partum depression. © 2016 Japan Society of Obstetrics and Gynecology.

  17. Exploratory analysis of textual data from the Mother and Child Handbook using a text mining method (II): Monthly changes in the words recorded by mothers.

    Science.gov (United States)

    Tagawa, Miki; Matsuda, Yoshio; Manaka, Tomoko; Kobayashi, Makiko; Ohwada, Michitaka; Matsubara, Shigeki

    2017-01-01

    The aim of the study was to examine the possibility of converting subjective textual data written in the free column space of the Mother and Child Handbook (MCH) into objective information using text mining and to compare any monthly changes in the words written by the mothers. Pregnant women without complications (n = 60) were divided into two groups according to State-Trait Anxiety Inventory grade: low trait anxiety (group I, n = 39) and high trait anxiety (group II, n = 21). Exploratory analysis of the textual data from the MCH was conducted by text mining using the Word Miner software program. Using 1203 structural elements extracted after processing, a comparison of monthly changes in the words used in the mothers' comments was made between the two groups. The data was mainly analyzed by a correspondence analysis. The structural elements in groups I and II were divided into seven and six clusters, respectively, by cluster analysis. Correspondence analysis revealed clear monthly changes in the words used in the mothers' comments as the pregnancy progressed in group I, whereas the association was not clear in group II. The text mining method was useful for exploratory analysis of the textual data obtained from pregnant women, and the monthly change in the words used in the mothers' comments as pregnancy progressed differed according to their degree of unease. © 2016 Japan Society of Obstetrics and Gynecology.

  18. LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes.

    Science.gov (United States)

    Cañada, Andres; Capella-Gutierrez, Salvador; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso; Krallinger, Martin

    2017-07-03

    A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes-CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Anomaly Detection with Text Mining

    Data.gov (United States)

    National Aeronautics and Space Administration — Many existing complex space systems have a significant amount of historical maintenance and problem data bases that are stored in unstructured text forms. The...

  20. Text Mining the Biomedical Literature

    Science.gov (United States)

    2007-11-05

    Mexico . http://www.cie.unam.mx/W_Reportes. 20. Kostoff, R. N., and Del Rio, J. A., “The Impact of Physics Research”, Physics World, June 2001. 21...NAILFOLD CAPILLARY MICROSCOPY *** *** CARDIOVASCULAR/ PULMONARY CIRCULATION PROBLEMS *** ** *** *** BIOFEEDBACK AND AUTOGENIC TRAINING...e.g., tuberculosis ). A polling of numerous medical experts did not identify any database that contains patient lateral non-cancer chronic disease

  1. SciLite: a platform for displaying text-mined annotations as a means to link research articles with biological data

    Science.gov (United States)

    Talo, Francesco; Ide-Smith, Michele; Gobeill, Julien; Carter, Jacob; Batista-Navarro, Riza; Ananiadou, Sophia; Ruch, Patrick; McEntyre, Johanna

    2017-01-01

    The tremendous growth in biological data has resulted in an increase in the number of research papers being published. This presents a great challenge for scientists in searching and assimilating facts described in those papers. Particularly, biological databases depend on curators to add highly precise and useful information that are usually extracted by reading research articles. Therefore, there is an urgent need to find ways to improve linking literature to the underlying data, thereby minimising the effort in browsing content and identifying key biological concepts.   As part of the development of Europe PMC, we have developed a new platform, SciLite, which integrates text-mined annotations from different sources and overlays those outputs on research articles. The aim is to aid researchers and curators using Europe PMC in finding key concepts more easily and provide links to related resources or tools, bridging the gap between literature and biological data. PMID:28948232

  2. Text-mining as a methodology to assess eating disorder-relevant factors: Comparing mentions of fitness tracking technology across online communities.

    Science.gov (United States)

    McCaig, Duncan; Bhatia, Sudeep; Elliott, Mark T; Walasek, Lukasz; Meyer, Caroline

    2018-05-07

    Text-mining offers a technique to identify and extract information from a large corpus of textual data. As an example, this study presents the application of text-mining to assess and compare interest in fitness tracking technology across eating disorder and health-related online communities. A list of fitness tracking technology terms was developed, and communities (i.e., 'subreddits') on a large online discussion platform (Reddit) were compared regarding the frequency with which these terms occurred. The corpus used in this study comprised all comments posted between May 2015 and January 2018 (inclusive) on six subreddits-three eating disorder-related, and three relating to either fitness, weight-management, or nutrition. All comments relating to the same 'thread' (i.e., conversation) were concatenated, and formed the cases used in this study (N = 377,276). Within the eating disorder-related subreddits, the findings indicated that a 'pro-eating disorder' subreddit, which is less recovery focused than the other eating disorder subreddits, had the highest frequency of fitness tracker terms. Across all subreddits, the weight-management subreddit had the highest frequency of the fitness tracker terms' occurrence, and MyFitnessPal was the most frequently mentioned fitness tracker. The technique exemplified here can potentially be used to assess group differences to identify at-risk populations, generate and explore clinically relevant research questions in populations who are difficult to recruit, and scope an area for which there is little extant literature. The technique also facilitates methodological triangulation of research findings obtained through more 'traditional' techniques, such as surveys or interviews. © 2018 Wiley Periodicals, Inc.

  3. Are Female Applicants Disadvantaged in National Institutes of Health Peer Review? Combining Algorithmic Text Mining and Qualitative Methods to Detect Evaluative Differences in R01 Reviewers' Critiques.

    Science.gov (United States)

    Magua, Wairimu; Zhu, Xiaojin; Bhattacharya, Anupama; Filut, Amarette; Potvien, Aaron; Leatherberry, Renee; Lee, You-Geon; Jens, Madeline; Malikireddy, Dastagiri; Carnes, Molly; Kaatz, Anna

    2017-05-01

    Women are less successful than men in renewing R01 grants from the National Institutes of Health. Continuing to probe text mining as a tool to identify gender bias in peer review, we used algorithmic text mining and qualitative analysis to examine a sample of critiques from men's and women's R01 renewal applications previously analyzed by counting and comparing word categories. We analyzed 241 critiques from 79 Summary Statements for 51 R01 renewals awarded to 45 investigators (64% male, 89% white, 80% PhD) at the University of Wisconsin-Madison between 2010 and 2014. We used latent Dirichlet allocation to discover evaluative "topics" (i.e., words that co-occur with high probability). We then qualitatively examined the context in which evaluative words occurred for male and female investigators. We also examined sex differences in assigned scores controlling for investigator productivity. Text analysis results showed that male investigators were described as "leaders" and "pioneers" in their "fields," with "highly innovative" and "highly significant research." By comparison, female investigators were characterized as having "expertise" and working in "excellent" environments. Applications from men received significantly better priority, approach, and significance scores, which could not be accounted for by differences in productivity. Results confirm our previous analyses suggesting that gender stereotypes operate in R01 grant peer review. Reviewers may more easily view male than female investigators as scientific leaders with significant and innovative research, and score their applications more competitively. Such implicit bias may contribute to sex differences in award rates for R01 renewals.

  4. The Effects of Self-Regulatory Learning through Computer-Assisted Intelligent Tutoring System on the Improvement of EFL Learners' Speaking Ability

    Science.gov (United States)

    Mohammadzadeh, Ahmad; Sarkhosh, Mehdi

    2018-01-01

    The current study attempted to investigate the effects of self-regulatory learning through computer-assisted intelligent tutoring system on the improvement of speaking ability. The participants of the study, who spoke Azeri Turkish as their mother tongue, were students of Applied Linguistics at BA level at Pars Abad's Azad University, Ardebil,…

  5. Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value.

    Science.gov (United States)

    Bishop, Dorothy V M; Thompson, Paul A

    2016-01-01

    Background. The p-curve is a plot of the distribution of p-values reported in a set of scientific studies. Comparisons between ranges of p-values have been used to evaluate fields of research in terms of the extent to which studies have genuine evidential value, and the extent to which they suffer from bias in the selection of variables and analyses for publication, p-hacking. Methods. p-hacking can take various forms. Here we used R code to simulate the use of ghost variables, where an experimenter gathers data on several dependent variables but reports only those with statistically significant effects. We also examined a text-mined dataset used by Head et al. (2015) and assessed its suitability for investigating p-hacking. Results. We show that when there is ghost p-hacking, the shape of the p-curve depends on whether dependent variables are intercorrelated. For uncorrelated variables, simulated p-hacked data do not give the "p-hacking bump" just below .05 that is regarded as evidence of p-hacking, though there is a negative skew when simulated variables are inter-correlated. The way p-curves vary according to features of underlying data poses problems when automated text mining is used to detect p-values in heterogeneous sets of published papers. Conclusions. The absence of a bump in the p-curve is not indicative of lack of p-hacking. Furthermore, while studies with evidential value will usually generate a right-skewed p-curve, we cannot treat a right-skewed p-curve as an indicator of the extent of evidential value, unless we have a model specific to the type of p-values entered into the analysis. We conclude that it is not feasible to use the p-curve to estimate the extent of p-hacking and evidential value unless there is considerable control over the type of data entered into the analysis. In particular, p-hacking with ghost variables is likely to be missed.

  6. Networks Models of Actin Dynamics during Spermatozoa Postejaculatory Life: A Comparison among Human-Made and Text Mining-Based Models

    Directory of Open Access Journals (Sweden)

    Nicola Bernabò

    2016-01-01

    Full Text Available Here we realized a networks-based model representing the process of actin remodelling that occurs during the acquisition of fertilizing ability of human spermatozoa (HumanMade_ActinSpermNetwork, HM_ASN. Then, we compared it with the networks provided by two different text mining tools: Agilent Literature Search (ALS and PESCADOR. As a reference, we used the data from the online repository Kyoto Encyclopaedia of Genes and Genomes (KEGG, referred to the actin dynamics in a more general biological context. We found that HM_ALS and the networks from KEGG data shared the same scale-free topology following the Barabasi-Albert model, thus suggesting that the information is spread within the network quickly and efficiently. On the contrary, the networks obtained by ALS and PESCADOR have a scale-free hierarchical architecture, which implies a different pattern of information transmission. Also, the hubs identified within the networks are different: HM_ALS and KEGG networks contain as hubs several molecules known to be involved in actin signalling; ALS was unable to find other hubs than “actin,” whereas PESCADOR gave some nonspecific result. This seems to suggest that the human-made information retrieval in the case of a specific event, such as actin dynamics in human spermatozoa, could be a reliable strategy.

  7. Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years.

    Science.gov (United States)

    Wagner, Mathias; Vicinus, Benjamin; Muthra, Sherieda T; Richards, Tereza A; Linder, Roland; Frick, Vilma Oliveira; Groh, Andreas; Rubie, Claudia; Weichert, Frank

    2016-06-01

    The continuous growth of medical sciences literature indicates the need for automated text analysis. Scientific writing which is neither unitary, transcending social situation nor defined by a timeless idea is subject to constant change as it develops in response to evolving knowledge, aims at different goals, and embodies different assumptions about nature and communication. The objective of this study was to evaluate whether publication dates should be considered when performing text mining. A search of PUBMED for combined references to chemokine identifiers and particular cancer related terms was conducted to detect changes over the past 36 years. Text analyses were performed using freeware available from the World Wide Web. TOEFL Scores of territories hosting institutional affiliations as well as various readability indices were investigated. Further assessment was conducted using Principal Component Analysis. Laboratory examination was performed to evaluate the quality of attempts to extract content from the examined linguistic features. The PUBMED search yielded a total of 14,420 abstracts (3,190,219 words). The range of findings in laboratory experimentation were coherent with the variability of the results described in the analyzed body of literature. Increased concurrence of chemokine identifiers together with cancer related terms was found at the abstract and sentence level, whereas complexity of sentences remained fairly stable. The findings of the present study indicate that concurrent references to chemokines and cancer increased over time whereas text complexity remained stable. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Prodromal signs and symptoms of serious infections with tocilizumab treatment for rheumatoid arthritis: Text mining of the Japanese postmarketing adverse event-reporting database.

    Science.gov (United States)

    Atsumi, Tatsuya; Ando, Yoshiaki; Matsuda, Shinichi; Tomizawa, Shiho; Tanaka, Riwa; Takagi, Nobuhiro; Nakasone, Ayako

    2018-05-01

    To search for signs and symptoms before serious infection (SI) occurs in tocilizumab (TCZ)-treated rheumatoid arthritis (RA) patients. Individual case safety reports, including structured (age, sex, adverse event [AE]) and unstructured (clinical narratives) data, were analyzed by automated text mining from a Japanese post-marketing AE-reporting database (16 April 2008-10 April 2015) assuming the following: treated in Japan; TCZ RA treatment; ≥1 SI; unable to exclude causality between TCZ and SIs. The database included 7653 RA patients; 1221 reports met four criteria, encompassing 1591 SIs. Frequent SIs were pneumonia (15.9%), cellulitis (9.9%), and sepsis (5.0%). Reports for 782 patients included SI onset date; 60.7% of patients had signs/symptoms ≤28 days before SI diagnosis, 32.7% had signs/symptoms with date unidentified, 1.7% were asymptomatic, and 4.9% had unknown signs/symptoms. The most frequent signs/symptoms were for skin (swelling and pain) and respiratory (cough and pyrexia) infections. Among 68 patients who had normal laboratory results for C-reactive protein, body temperature, and white blood cell count, 94.1% had signs or symptoms of infection. This study identified prodromal signs and symptoms of SIs in RA patients receiving TCZ. Data mining clinical narratives from post-marketing AE databases may be beneficial in characterizing SIs.

  9. CGMIM: Automated text-mining of Online Mendelian Inheritance in Man (OMIM to identify genetically-associated cancers and candidate genes

    Directory of Open Access Journals (Sweden)

    Jones Steven

    2005-03-01

    Full Text Available Abstract Background Online Mendelian Inheritance in Man (OMIM is a computerized database of information about genes and heritable traits in human populations, based on information reported in the scientific literature. Our objective was to establish an automated text-mining system for OMIM that will identify genetically-related cancers and cancer-related genes. We developed the computer program CGMIM to search for entries in OMIM that are related to one or more cancer types. We performed manual searches of OMIM to verify the program results. Results In the OMIM database on September 30, 2004, CGMIM identified 1943 genes related to cancer. BRCA2 (OMIM *164757, BRAF (OMIM *164757 and CDKN2A (OMIM *600160 were each related to 14 types of cancer. There were 45 genes related to cancer of the esophagus, 121 genes related to cancer of the stomach, and 21 genes related to both. Analysis of CGMIM results indicate that fewer than three gene entries in OMIM should mention both, and the more than seven-fold discrepancy suggests cancers of the esophagus and stomach are more genetically related than current literature suggests. Conclusion CGMIM identifies genetically-related cancers and cancer-related genes. In several ways, cancers with shared genetic etiology are anticipated to lead to further etiologic hypotheses and advances regarding environmental agents. CGMIM results are posted monthly and the source code can be obtained free of charge from the BC Cancer Research Centre website http://www.bccrc.ca/ccr/CGMIM.

  10. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more.

    Science.gov (United States)

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-07-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: 'Find all diseases associated with Bisphenol A'. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. HLA-G regulatory haplotypes and implantation outcome in couples who underwent assisted reproduction treatment.

    Science.gov (United States)

    Costa, Cynthia Hernandes; Gelmini, Georgia Fernanda; Wowk, Pryscilla Fanini; Mattar, Sibelle Botogosque; Vargas, Rafael Gustavo; Roxo, Valéria Maria Munhoz Sperandio; Schuffner, Alessandro; Bicalho, Maria da Graça

    2012-09-01

    The role of HLA-G in several clinical conditions related to reproduction has been investigated. Important polymorphisms have been found within the 5'URR and 3'UTR regions of the HLA-G promoter. The aim of the present study was to investigate 16 SNPs in the 5'URR and 14-bp insertion/deletion (ins/del) polymorphism located in the 3'UTR region of the HLA-G gene and its possible association with the implantation outcome in couples who underwent assisted reproduction treatments (ART). The case group was composed of 25 ART couples. Ninety-four couples with two or more term pregnancies composed the control group. Polymorphism haplotype frequencies of the HLA-G were determined for both groups. The Haplotype 5, Haplotype 8 and Haplotype 11 were absolute absence in ART couples. The HLA-G*01:01:02a, HLA-G*01:01:02b alleles and the 14-bp ins polymorphism, Haplotype 2, showed an increased frequency in case women and similar distribution between case and control men. However, this susceptibility haplotype is significantly presented in case women and in couple with failure implantation after treatment, which led us to suggest a maternal effect, associated with this haplotype, once their presence in women is related to a higher number of couples who underwent ART. Copyright © 2012. Published by Elsevier Inc.

  12. Core Values in Nursing Care Based on the Experiences of Nurses Engaged in Neonatal Nursing: A Text-mining Approach for Analyzing Reflection Records

    Science.gov (United States)

    Watanabe, Hiromi; Okuda, Reiko; Hagino, Hiroshi

    2018-01-01

    Background Strong feelings about and enthusiasm for nursing care are reflected in nurses’ thoughts and behaviors in clinical practice and affect their profession. This study was conducted to identify the characteristics of core values in nursing care based on the experiences of nurses engaged in neonatal nursing through a process for recognizing the conceptualization of nursing. Methods We conceptualized nursing care in 43 nurses who were involved in neonatal nursing using a reflection sheet. We classified descriptions on a sheet based on the Three-Staged Recognition scheme and analyzed them using a text-mining approach. Results Nurses involved in neonatal nursing recognized that they must take care of the “child,” “mother,” and “family.” Important elements of nursing in nurses with less than 5 years versus 5 or more years of neonatal nursing experience were classified into seven clusters, respectively. These elements were mainly related to family members in both groups. In nurses with less than 5 years of experience, four clusters of one-way communication by nurses were observed in the analysis of the key elements in nursing. On the other hand, five clusters of mutual relationships between patients, their family members, and nurses were observed in nurses with 5 or more years of experience. Conclusion In conclusion, the core value of nurses engaged in neonatal nursing is family-oriented nursing. Nurses with 5 or more years of neonatal nursing experience understand patients and their family members well through establishing relationships and providing comfort and safety while taking care of them. PMID:29599621

  13. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    Directory of Open Access Journals (Sweden)

    Hettne Kristina M

    2013-01-01

    Full Text Available Abstract Background Availability of chemical response-specific lists of genes (gene sets for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM, and that these can be used with gene set analysis (GSA methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human and 588 (mouse gene sets from the Comparative Toxicogenomics Database (CTD. We tested for significant differential expression (SDE (false discovery rate -corrected p-values Results Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the triazoles. We confirmed embryotoxic effects, and discriminated triazoles from other chemicals. Conclusions Gene set analysis with next-gen TM-derived chemical response-specific gene sets is a scalable method for identifying similarities in gene responses to other chemicals, from which one may infer potential mode of action and/or toxic effect.

  14. Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

    Science.gov (United States)

    2013-01-01

    Background Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set analysis (GSA) methods for chemical treatment identification, for pharmacological mechanism elucidation, and for comparing compound toxicity profiles. Methods We created 30,211 chemical response-specific gene sets for human and mouse by next-gen TM, and derived 1,189 (human) and 588 (mouse) gene sets from the Comparative Toxicogenomics Database (CTD). We tested for significant differential expression (SDE) (false discovery rate -corrected p-values sets and the CTD-derived gene sets in gene expression (GE) data sets of five chemicals (from experimental models). We tested for SDE of gene sets for six fibrates in a peroxisome proliferator-activated receptor alpha (PPARA) knock-out GE dataset and compared to results from the Connectivity Map. We tested for SDE of 319 next-gen TM-derived gene sets for environmental toxicants in three GE data sets of triazoles, and tested for SDE of 442 gene sets associated with embryonic structures. We compared the gene sets to triazole effects seen in the Whole Embryo Culture (WEC), and used principal component analysis (PCA) to discriminate triazoles from other chemicals. Results Next-gen TM-derived gene sets matching the chemical treatment were significantly altered in three GE data sets, and the corresponding CTD-derived gene sets were significantly altered in five GE data sets. Six next-gen TM-derived and four CTD-derived fibrate gene sets were significantly altered in the PPARA knock-out GE dataset. None of the fibrate signatures in cMap scored significant against the PPARA GE signature. 33 environmental toxicant gene sets were significantly altered in the triazole GE data sets. 21 of these toxicants had a similar toxicity pattern as the

  15. Identifying topics of interest of Mendeley users using the text mining and overlay visualization functionality of VOS viewer. 20th International Conference in Science & Technology Indicators, 2-4, September, 2015, Lugano, Switzerland

    NARCIS (Netherlands)

    Zahedi, Z.; Van, Eck N.J.P.

    2015-01-01

    This paper presents the results of a study in which we have analysed the topics of interest of Mendeley users (i.e. Students, PhDs, Post Docs, Researchers, Professors, Librarians, Lecturers & other Professionals) using text mining and visualization techniques. Beside analyzing topics of interest of

  16. Regulatory Assistance, Stakeholder Outreach, and Coastal and Marine Spatial Planning Activities in Support of Marine and Hydrokinetic Energy Deployment

    Energy Technology Data Exchange (ETDEWEB)

    Geerlofs, Simon H. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Copping, Andrea E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Van Cleve, Frances B. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Blake, Kara M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Hanna, Luke A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2011-09-01

    This fiscal year 2011 progress report summarizes activities carried out under DOE Water Power Task 2.1.7, Permitting and Planning. Activities under Task 2.1.7 address the concerns of a wide range of stakeholders with an interest in the development of the marine and hydrokinetic (MHK) energy industry, including regulatory and resource management agencies, tribes, nongovernmental organizations, and industry.

  17. 10 CFR Appendix A to Part 5 - List of Federal Financial Assistance Administered by the Nuclear Regulatory Commission to Which...

    Science.gov (United States)

    2010-01-01

    ... COMMISSION NONDISCRIMINATION ON THE BASIS OF SEX IN EDUCATION PROGRAMS OR ACTIVITIES RECEIVING FEDERAL... regulatory programs and related matters at NRC facilities and offices, or other locations. (b) Orientations... recovery, to receive orientation and on-the-job instruction at NRC facilities and offices, or other...

  18. Aplicación de técnicas de text mining para analizar las interacciones de los estudiantes en el proceso de aprendizaje de gestión de proyectos

    OpenAIRE

    Olarte Valentín, Rubén; González Marcos, Ana; Alba Elías, Fernando; Ordieres-Meré, Joaquín

    2016-01-01

    En este artículo se presenta la aplicación de técnicas de text mining para analizar la comunicación online de estudiantes que trabajan juntos en un mismo proyecto, con el fin de identificar la aparición de problemas en el desarrollo de la experiencia de aprendizaje en gestión de proyectos. Los datos empleados en este estudio son los mensajes que los estudiantes intercambiaron a través de las herramientas de comunicación existentes en la plataforma web empleada específicamente para el desarrol...

  19. Morally-Relevant Similarities and Differences Between Assisted Dying Practices in Paradigm and Non-Paradigm Circumstances: Could They Inform Regulatory Decisions?

    Science.gov (United States)

    Kirby, Jeffrey

    2017-12-01

    There has been contentious debate over the years about whether there are morally relevant similarities and differences between the three practices of continuous deep sedation until death, physician-assisted suicide, and voluntary euthanasia. Surprisingly little academic attention has been paid to a comparison of the uses of these practices in the two types of circumstances in which they are typically performed. A comparative domains of ethics analysis methodological approach is used in the paper to compare 1) the use of the three practices in paradigm circumstances, and 2) the use of the practices in paradigm circumstances to their use in non-paradigm circumstances. The analytical outcomes suggest that a bright moral line cannot be demonstrated between any two of the practices in paradigm circumstances, and that there are significant, morally-relevant distinctions between their use in paradigm and non-paradigm circumstances. A thought experiment is employed to illustrate how these outcomes could possibly inform the decisions of hypothetical deliberators who are engaged in the collaborative development of assisted dying regulatory frameworks.

  20. International assistance. Licensing assistance project

    International Nuclear Information System (INIS)

    Aleev, A.

    1999-01-01

    Description of licensing assistance project for VATESI is presented. In licensing of unit No.1 of INPP VATESI is supported by many western countries. Experts from regulatory bodies or scientific organizations of those countries assist VATESI staff in reviewing documentation presented by INPP. Among bilateral cooperation support is provided by European Commission through Phare programme

  1. Exploring Dimensionality Reduction for Text Mining

    Science.gov (United States)

    2007-05-04

    result is v: v = √ K2 · 1 (3.20) 6) Define K3 to be the element-by-element division of K2 by the product of v and its transpose: K3i,j = K2i,j/(v · vT...3.21) 30 7) Compute the singular value decomposition of K3 to get U , D, and V as specified in Equation 3.5. 8) The output points can then be...adequate safety studies. Procter and Gamble agrees that olestra helps carry away fat-soluble vitamins such as A, D, E, and K. Indeed, the firm plans to

  2. Science and Technology Text Mining: Electrochemical Power

    Science.gov (United States)

    2003-07-14

    electrodes) and improvements based on component materials (glassy carbon, carbon fibers, aerogels , thin films). A focal point of electrochemical capacitor...performance of carbon aerogels ; and the fabrication and application of Cu-carbon composite (prepared from sawdust) to electrochemical capacitor electrodes. xi...applications require decreases in size and weight, especially for space, aircraft , and individual soldier or small team applications. For large volumes

  3. Validating Curriculum Development Using Text Mining

    Science.gov (United States)

    West, Jason

    2017-01-01

    Interdisciplinarity requires the collaboration of two or more disciplines to combine their expertise to jointly develop and deliver learning and teaching outcomes appropriate for a subject area. Curricula and assessment mapping are critical components to foster and enhance interdisciplinary learning environments. Emerging careers in data science…

  4. Analysing Customer Opinions with Text Mining Algorithms

    Science.gov (United States)

    Consoli, Domenico

    2009-08-01

    Knowing what the customer thinks of a particular product/service helps top management to introduce improvements in processes and products, thus differentiating the company from their competitors and gain competitive advantages. The customers, with their preferences, determine the success or failure of a company. In order to know opinions of the customers we can use technologies available from the web 2.0 (blog, wiki, forums, chat, social networking, social commerce). From these web sites, useful information must be extracted, for strategic purposes, using techniques of sentiment analysis or opinion mining.

  5. Nuclear Regulatory legislation

    International Nuclear Information System (INIS)

    1984-06-01

    This compilation of statutes and material pertaining to nuclear regulatory legislation through the 97th Congress, 2nd Session, has been prepared by the Office of the Executive Legal Director, U.S. Nuclear Regulatory Commission, with the assistance of staff, for use as an internal resource document

  6. Exploração do acervo da RAE-Revista de Administração de Empresas (1961 a 2016 à luz da bibliometria, text mining, rede social e geoanálise

    Directory of Open Access Journals (Sweden)

    José Eduardo Ricciardi Favaretto

    2017-08-01

    Full Text Available This article examined more than five decades of the Revista de Administração de Empresas (Journal of Business Administration [RAE], between 1961 and 2016, through accessing documents made available on the internet in the electronic repository of periodicals and magazines of the Biblioteca Digital da Fundação Getulio Vargas-Escola de Administração de Empresas de São Paulo (Digital Library of the Getulio Vargas Foundation-School of Business Administration of São Paulo, which follows the Open Archives Initiative-Protocol for Metadata Harvesting (OAI-PMH of interoperability between digital repositories. A total of 2,381 documents published in the journal (1,422 articles, 217 editorials, 62 opinion articles, and 680 reviews were collected through an automated process and later analyzed using techniques such as bibliometrics, text mining, social networking, and geo-analysis. This study enables understanding of the path that the RAE journal has followed throughout its existence including 22 different management periods, the increase of authorship within its publications during 14 time intervals, the most frequent and important terms and keywords appearing in its published documents, and the formation of co-authoring networks of researchers who contribute to the development of the Administration science in the Country.

  7. A methodology for semiautomatic taxonomy of concepts extraction from nuclear scientific documents using text mining techniques; Metodologia para extracao semiautomatica de uma taxonomia de conceitos a partir da producao cientifica da area nuclear utilizando tecnicas de mineracao de textos

    Energy Technology Data Exchange (ETDEWEB)

    Braga, Fabiane dos Reis

    2013-07-01

    This thesis presents a text mining method for semi-automatic extraction of taxonomy of concepts, from a textual corpus composed of scientific papers related to nuclear area. The text classification is a natural human practice and a crucial task for work with large repositories. The document clustering technique provides a logical and understandable framework that facilitates the organization, browsing and searching. Most clustering algorithms using the bag of words model to represent the content of a document. This model generates a high dimensionality of the data, ignores the fact that different words can have the same meaning and does not consider the relationship between them, assuming that words are independent of each other. The methodology presents a combination of a model for document representation by concepts with a hierarchical document clustering method using frequency of co-occurrence concepts and a technique for clusters labeling more representatives, with the objective of producing a taxonomy of concepts which may reflect a structure of the knowledge domain. It is hoped that this work will contribute to the conceptual mapping of scientific production of nuclear area and thus support the management of research activities in this area. (author)

  8. Data preparation for municipal virtual assistant using machine learning

    OpenAIRE

    Jovan, Leon Noe

    2016-01-01

    The main goal of this master’s thesis was to develop a procedure that will automate the construction of the knowledge base for a virtual assistant that answers questions about municipalities in Slovenia. The aim of the procedure is to replace or facilitate manual preparation of the virtual assistant's knowledge base. Theoretical backgrounds of different machine learning fields, such as multilabel classification, text mining and learning from weakly labeled data were examined to gain a better ...

  9. Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

    Science.gov (United States)

    Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

    2000-01-01

    These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)

  10. The success of assisted reproduction technologies in relation to composition of the total regulatory T cell (Treg) pool and different Treg subsets.

    Science.gov (United States)

    Schlossberger, V; Schober, L; Rehnitz, J; Schaier, M; Zeier, M; Meuer, S; Schmitt, E; Toth, B; Strowitzki, T; Steinborn, A

    2013-11-01

    Are there differences in composition of the total regulatory T cell (Treg) pool and distinct Treg subsets (naïve CD45RA(+)-Tregs, HLA-DR(-)- and HLA-DR(+)-memory Tregs) between successfully and non-successfully IVF/ICSI-treated women? Non-successfully IVF/ICSI-treated women have a decreased percentage of naïve CD45RA(+)-Tregs and an increased percentage of HLA-DR(-)-memory Tregs within the total Treg pool. Immunosuppressive Tregs play a significant role in human reproduction and studies have shown that their number and function are reduced in reproductive failure and complications of pregnancy such as pre-eclampsia and preterm labor. However, no data exist concerning the importance of Tregs for a successful outcome following assisted reproduction technologies. Blood samples were obtained from 210 women undergoing IVF/ICSI treatment, where 14 patients were excluded due to biochemical pregnancy or missed abortion. Age control blood samples were collected from 20 neonates and 176 healthy female volunteers. The study was performed between October 2010 and March 2012. In this study, we determined prospectively the quantity and composition of the total CD4(+)CD127(low+/-)CD25(+)FoxP3(+)-Treg pool and three different Treg subsets (naïve CD45RA(+)-Tregs, HLA-DR(-)- and HLA-DR(+)-memory Tregs) in all women undergoing IVF/ICSI treatment. We examined whether there were differences between those who became pregnant (n = 36) and those who did not (n = 160). The blood samples were collected within 1 h before the embryo transfer and analyzed by six-color flow cytometry. In order to evaluate these results with regard to the normal age-related changes in composition of the total Treg pool, the same analysis was performed using samples of umbilical cord blood and from healthy female volunteers aged between 17 and 76 years. The composition of the total Treg pool was documented for successfully IVF/ICSI-treated women (n = 5) throughout their pregnancy and we assessed the

  11. 75 FR 21889 - Semiannual Regulatory Agenda

    Science.gov (United States)

    2010-04-26

    ... assistance to small business concerns owned and controlled by women, and to women wishing to start a small... Part XVI Small Business Administration ###Semiannual Regulatory Agenda### [[Page 21890

  12. Regulatory agencies and regulatory risk

    OpenAIRE

    Knieps, Günter; Weiß, Hans-Jörg

    2008-01-01

    The aim of this paper is to show that regulatory risk is due to the discretionary behaviour of regulatory agencies, caused by a too extensive regulatory mandate provided by the legislator. The normative point of reference and a behavioural model of regulatory agencies based on the positive theory of regulation are presented. Regulatory risk with regard to the future behaviour of regulatory agencies is modelled as the consequence of the ex ante uncertainty about the relative influence of inter...

  13. Assistance to Oil and Gas State Agencies and Industry through Continuation of Environmental and Production Data Management and a Water Regulatory Initiative

    Energy Technology Data Exchange (ETDEWEB)

    Grunewald, Ben; Arthur, Dan; Langhus, Bruce; Gillespie, Tom; Binder, Ben; Warner, Don; Roberts, Jim; Cox, D.O.

    2002-05-31

    This grant project was a major step toward completion of the Risk Based Data Management System (RBDMS) project. Additionally the project addresses the needs identified during the projects initial phases. By implementing this project, the following outcomes were sought: (1) State regulatory agencies implemented more formalized environmental risk management practices as they pertain to the production of oil and gas, and injection via Class II wells. (2) Enhancement of oil and gas production by implementing a management system supporting the saving of abandoned or idle wells located in areas with a relatively low environmental risk of endangering underground sources of drinking water (USDWs) in a particular state. (3) Verification that protection of USDWs is adequate and additional restrictions of requirements are not necessary in areas with a relatively low environmental risk. (4) Standardization of data and information maintained by state regulatory agencies and decrease the regulatory cost burden on producers operating in multiple states, and (5) Development of a system for electronic data transfer among operators and state regulatory agencies and reduction of overall operator reporting burdens.

  14. Technical assistance for regulatory development: review and evaluation of the EPA standard 40 CFR191 for disposal of high-level waste. Vol. 1

    International Nuclear Information System (INIS)

    Ortiz, N.R.; Wahi, K.K.

    1983-04-01

    The Environmental Protection Agency (EPA) has prepared a draft Standard (40CFR191, Draft 19) which, when finalized, will provide the overall system requirements for the geologic disposal of radioactive waste. This document (Vol. 1) provides an Executive Summary of the work performed at Sandia National Laboratories, Albuquerque, NM, under contract to the US Nuclear Regulatory Commission to analyze certain aspects of the draft Standard. The issues of radionuclide release limits, interpretation, uncertainty, achievability, and assessment of compliance with respect to the requirements of the draft Standard are addressed based on the detailed analyses presented in five companion volumes to this report

  15. Regulatory activities

    International Nuclear Information System (INIS)

    2001-01-01

    This publication, compiled in 8 chapters, presents the regulatory system developed by the Nuclear Regulatory Authority (NRA) of the Argentine Republic. The following activities and developed topics in this document describe: the evolution of the nuclear regulatory activity in Argentina; the Argentine regulatory system; the nuclear regulatory laws and standards; the inspection and safeguards of nuclear facilities; the emergency systems; the environmental systems; the environmental monitoring; the analysis laboratories on physical and biological dosimetry, prenatal irradiation, internal irradiation, radiation measurements, detection techniques on nuclear testing, medical program on radiation protection; the institutional relations with national and international organization; the training courses and meeting; the technical information

  16. Managing Regulatory Body Competence

    International Nuclear Information System (INIS)

    2013-01-01

    In 2001, the IAEA published TECDOC 1254, which examined the way in which the recognized functions of a regulatory body for nuclear facilities results in competence needs. Using the systematic approach to training (SAT), TECDOC 1254 provided a framework for regulatory bodies for managing training and developing and their maintaining their competence. It has been successfully used by many regulators. The IAEA has also introduced a methodology and an assessment tool - Guidelines for Systematic Assessment of Regulatory Competence Needs (SARCoN) - which provides practical guidance on analysing the training and development needs of a regulatory body and, through a gap analysis, guidance on establishing competence needs and how to meet them. In 2009, the IAEA established a steering committee (supported by a bureau) with the mission to advise the IAEA on how it could best assist Member States to develop suitable competence management systems for their regulatory bodies. The committee recommended the development of a safety report on managing staff competence as an integral part of a regulatory body's management system. This Safety Report was developed in response to this request. It supersedes TECDOC 1254, broadens its application to regulatory bodies for all facilities and activities, and builds upon the experience gained through the application of TECDOC 1254 and SARCoN and the feedback received from Member States. This Safety Report applies to the management of adequate competence as needs change, and as such is equally applicable to the needs of States 'embarking' on a nuclear power programme. It also deals with the special case of building up the competence of regulatory bodies as part of the overall process of establishing an 'embarking' State's regulatory system

  17. Nuclear Regulatory Legislation

    International Nuclear Information System (INIS)

    1989-08-01

    This compilation of statutes and material pertaining to nuclear regulatory legislation through the 100th Congress, 2nd Session, has been prepared by the Office of the General Counsel, US Nuclear Regulatory Commission, with the assistance of staff, for use as an internal resource document. Persons using this document are placed on notice that it may not be used as an authoritative citation in lieu of the primary legislative sources. Furthermore, while every effort has been made to ensure the completeness and accuracy of this material, neither the United States Government, the Nuclear Regulatory Commission, nor any of their employees makes any expressed or implied warranty or assumes liability for the accuracy or completeness of the material presented in this compilation

  18. Enabling legislation and regulatory determinations for a nuclear power programme

    International Nuclear Information System (INIS)

    Ha Vinh Phuong

    1977-01-01

    General remarks on objectives and scope of enabling legislation, on the regulatory body and on the IAEA activities and assistance in regulatory matters e.g. the IAEA Safety Guides which are in preparation. (HP) [de

  19. Risk-based Regulatory Evaluation Program methodology

    International Nuclear Information System (INIS)

    DuCharme, A.R.; Sanders, G.A.; Carlson, D.D.; Asselin, S.V.

    1987-01-01

    The objectives of this DOE-supported Regulatory Evaluation Progrwam are to analyze and evaluate the safety importance and economic significance of existing regulatory guidance in order to assist in the improvement of the regulatory process for current generation and future design reactors. A risk-based cost-benefit methodology was developed to evaluate the safety benefit and cost of specific regulations or Standard Review Plan sections. Risk-based methods can be used in lieu of or in combination with deterministic methods in developing regulatory requirements and reaching regulatory decisions

  20. Regulatory Control of Radiation Sources. Safety Guide

    International Nuclear Information System (INIS)

    2009-01-01

    This Safety Guide is intended to assist States in implementing the requirements established in Safety Standards Series No. GS-R-1, Legal and Governmental Infrastructure for Nuclear, Radiation, Radioactive Waste and Transport Safety, for a national regulatory infrastructure to regulate any practice involving radiation sources in medicine, industry, research, agriculture and education. The Safety Guide provides advice on the legislative basis for establishing regulatory bodies, including the effective independence of the regulatory body. It also provides guidance on implementing the functions and activities of regulatory bodies: the development of regulations and guides on radiation safety; implementation of a system for notification and authorization; carrying out regulatory inspections; taking necessary enforcement actions; and investigating accidents and circumstances potentially giving rise to accidents. The various aspects relating to the regulatory control of consumer products are explained, including justification, optimization of exposure, safety assessment and authorization. Guidance is also provided on the organization and staffing of regulatory bodies. Contents: 1. Introduction; 2. Legal framework for a regulatory infrastructure; 3. Principal functions and activities of the regulatory body; 4. Regulatory control of the supply of consumer products; 5. Functions of the regulatory body shared with other governmental agencies; 6. Organization and staffing of the regulatory body; 7. Documentation of the functions and activities of the regulatory body; 8. Support services; 9. Quality management for the regulatory system.

  1. Regulatory and licensee surveys

    International Nuclear Information System (INIS)

    2009-01-01

    Prior to the workshop two CSNI/WGHOF surveys were distributed. One survey was directed at regulatory bodies and the other was directed at plant licensees. The surveys were: 1 - Regulatory Expectations of Licensees' Arrangements to Ensure Suitable Organisational Structure, Resources and Competencies to Manage Safety (sent to WGHOF regulatory members). The survey requested that the respondents provide a brief overview of the situation related to plant organisations in their country, their regulatory expectations and their formal requirements. The survey addressed three subjects: the demonstration and documentation of organisational structures, resources and competencies, organisational changes, issues for improvement (for both current and new plants). Responses were received from eleven regulatory bodies. 2 - Approaches to Justify Organisational Suitability (sent to selected licensees). The purpose of the survey to was to gain an understanding of how licensees ensure organisational suitability, resources and competencies. This information was used to assist in the development of the issues and subjects that were addressed at the group discussion sessions. Responses were received from over fifteen licensees from nine countries. The survey requested that the licensees provide information on how they ensure effective organisational structures at their plants. The survey grouped the questions into the following four categories: organisational safety functions, resource and competence, decision-making and communication, good examples and improvement needs. The findings from these surveys were used in conjunction with other factors to identify the key issues for the workshop discussion sessions. The responses from these two surveys are discussed briefly in Sections 4 and 5 of this report. More extensive reviews of the regulatory and licensee responses are provided in Appendix 1

  2. Annotated chemical patent corpus: A gold standard for text mining

    NARCIS (Netherlands)

    S.A. Akhondi (Saber); A.G. Klenner (Alexander G.); C. Tyrchan (Christian); A.K. Manchala (Anil K.); K. Boppana (Kiran); D. Lowe (Daniel); M. Zimmermann (Marc); S.A.R.P. Jagarlapudi (Sarma A. R. P.); R. Sayle (Roger); J.A. Kors (Jan); C. Muresan (Cornelia)

    2014-01-01

    textabstractExploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting

  3. Science and Technology Text Mining: Hypersonic and Supersonic Flow

    Science.gov (United States)

    2003-11-17

    Saussure , 1949]. A summary of co-word origins, and evolution of co-word into computational linguistics, can be found in Kostoff [1993b]. Co-word...Global Thesauri. Information Processing and Management. 26:5. 1990. De Saussure , F. (1949). Cours de Linguistique Generale. 4eme Edition

  4. Churn prediction based on text mining and CRM data analysis

    OpenAIRE

    Schatzmann, Anders; Heitz, Christoph; Münch, Thomas

    2014-01-01

    Within quantitative marketing, churn prediction on a single customer level has become a major issue. An extensive body of literature shows that, today, churn prediction is mainly based on structured CRM data. However, in the past years, more and more digitized customer text data has become available, originating from emails, surveys or scripts of phone calls. To date, this data source remains vastly untapped for churn prediction, and corresponding methods are rarely described in literature. ...

  5. Text mining for metabolic reaction extraction from scientific literature

    NARCIS (Netherlands)

    Risse, J.E.

    2014-01-01

    Science relies on data in all its different forms. In molecular biology and bioinformatics in particular large scale data generation has taken centre stage in the form of high-throughput experiments. In line with this exponential increase of experimental data has been the near exponential growth

  6. Text mining and IRT for psychiatric and psychological assessment

    NARCIS (Netherlands)

    He, Qiwei

    2013-01-01

    The information age has made it easy to store and process large amounts of data, including both structured data (e.g., responses to questionnaires) and unstructured data (e.g., natural language or prose). As an additional source of information in assessments, textual data has been increasingly used

  7. Application of text mining for customer evaluations in commercial banking

    Science.gov (United States)

    Tan, Jing; Du, Xiaojiang; Hao, Pengpeng; Wang, Yanbo J.

    2015-07-01

    Nowadays customer attrition is increasingly serious in commercial banks. To combat this problem roundly, mining customer evaluation texts is as important as mining customer structured data. In order to extract hidden information from customer evaluations, Textual Feature Selection, Classification and Association Rule Mining are necessary techniques. This paper presents all three techniques by using Chinese Word Segmentation, C5.0 and Apriori, and a set of experiments were run based on a collection of real textual data that includes 823 customer evaluations taken from a Chinese commercial bank. Results, consequent solutions, some advice for the commercial bank are given in this paper.

  8. Science and Technology Text Mining: Mexico Core Competencies

    Science.gov (United States)

    2002-01-01

    government created a Researchers Fellowship ( Sistema Nacional de Investigadores-SNI). In this system, the government recognizes the research...Semantic Networks from Text using Leximancer. HLT-NAACL 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics - Companion Volume. ACL, May 2003. Demo23-Demo24. 117

  9. Cognition-Based Approaches for High-Precision Text Mining

    Science.gov (United States)

    Shannon, George John

    2017-01-01

    This research improves the precision of information extraction from free-form text via the use of cognitive-based approaches to natural language processing (NLP). Cognitive-based approaches are an important, and relatively new, area of research in NLP and search, as well as linguistics. Cognitive approaches enable significant improvements in both…

  10. Discovering Genres of Online Discussion Threads via Text Mining

    Science.gov (United States)

    Lin, Fu-Ren; Hsieh, Lu-Shih; Chuang, Fu-Tai

    2009-01-01

    As course management systems (CMS) gain popularity in facilitating teaching. A forum is a key component to facilitate the interactions among students and teachers. Content analysis is the most popular way to study a discussion forum. But content analysis is a human labor intensity process; for example, the coding process relies heavily on manual…

  11. Searching for Significance in Unstructured Data: Text Mining with Leximancer

    Science.gov (United States)

    Thomas, David A.

    2014-01-01

    Scholars in many knowledge domains rely on sophisticated information technologies to search for and retrieve records and publications pertinent to their research interests. But what is a scholar to do when a search identifies hundreds of documents, any of which might be vital or irrelevant to his or her work? The problem is further complicated by…

  12. Text mining electronic health records to identify hospital adverse events

    DEFF Research Database (Denmark)

    Gerdes, Lars Ulrik; Hardahl, Christian

    2013-01-01

    Manual reviews of health records to identify possible adverse events are time consuming. We are developing a method based on natural language processing to quickly search electronic health records for common triggers and adverse events. Our results agree fairly well with those obtained using manu...

  13. Addressing Information Proliferation: Applications of Information Extraction and Text Mining

    Science.gov (United States)

    Li, Jingjing

    2013-01-01

    The advent of the Internet and the ever-increasing capacity of storage media have made it easy to store, deliver, and share enormous volumes of data, leading to a proliferation of information on the Web, in online libraries, on news wires, and almost everywhere in our daily lives. Since our ability to process and absorb this information remains…

  14. Recommending personally interested contents by text mining, filtering, and interfaces

    Science.gov (United States)

    Xu, Songhua

    2015-10-27

    A personalized content recommendation system includes a client interface device configured to monitor a user's information data stream. A collaborative filter remote from the client interface device generates automated predictions about the interests of the user. A database server stores personal behavioral profiles and user's preferences based on a plurality of monitored past behaviors and an output of the collaborative user personal interest inference engine. A programmed personal content recommendation server filters items in an incoming information stream with the personal behavioral profile and identifies only those items of the incoming information stream that substantially matches the personal behavioral profile. The identified personally relevant content is then recommended to the user following some priority that may consider the similarity between the personal interest matches, the context of the user information consumption behaviors that may be shown by the user's content consumption mode.

  15. Text-Mining Applications for Creation of Biofilm Literature Database

    Directory of Open Access Journals (Sweden)

    Kanika Gupta

    2017-10-01

    So in the present research published corpora of 34306 documents for biofilm was collected from PubMed database along with non-indexed resources like books, conferences, newspaper articles, etc. and these were divided into five categories i.e. classification, growth and development, physiology, drug effects and radiation effects. These five categories were further individually divided into three parts i.e. Journal Title, Abstract Title, and Abstract Text to make indexing highly specific. Text-processing was done using the software Rapid Miner_v5.3, which tokenizes the entire text into words and provides the frequency of each word within the document. The obtained words were normalized using Remove Stop and Stem Word command of Rapid Miner_v5.3 which removes the stopping and stemming words. The obtained words were stored in MS-Excel 2007 and were sorted in decreasing order of frequency using Sort & Filter command of MS-Excel 2007. The words are visualization through networks obtained by Cytoscape_v2.7.0. Now the words obtained were highly specific for biofilms, generating a controlled biofilm vocabulary and this vocabulary could be used for indexing articles for biofilm (similar to MeSH database which indexes articles for PubMed. The obtained keywords information was stored in the relational database which is locally hosted using the WAMP_v2.4 (Windows, Apache, MySQL, PHP server. The available biofilm vocabulary will be significant for researchers studying biofilm literature, making their search easy and efficient.

  16. Trace of Knowledge: Benchmarking Novel Text Mining Based Measurements

    DEFF Research Database (Denmark)

    Woltmann, Sabrina

    The impact of public research outcomes on economies, and societies, in particular, in terms of innovation and development is widely accepted and empirically investigated [9, 3]. However, many studies suggest a systematic underestimation of the impact and benefits of public research. Empirical stu...

  17. PaperBLAST: Text Mining Papers for Information about Homologs.

    Science.gov (United States)

    Price, Morgan N; Arkin, Adam P

    2017-01-01

    Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST's database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/. IMPORTANCE With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins' functions.

  18. Seqenv: linking sequences to environments through text mining

    Czech Academy of Sciences Publication Activity Database

    Sinclair, L.; Ijaz, U.Z.; Jensen, L.J.; Coolen, M.J.L.; Gubry-Rangin, C.; Chroňáková, Alica; Oulas, A.; Pavloudi, Ch.; Schnetzer, J.; Weimann, A.; Ijaz, A.; Eiler, A.; Quince, Ch.; Pafilis, E.

    2016-01-01

    Roč. 4, December (2016), č. článku e2690. ISSN 2167-8359 Institutional support: RVO:60077344 Keywords : bioinformatics * ecology * microbiology * genomics * sequence analysis * text processing Subject RIV: EH - Ecology, Behaviour Impact factor: 2.177, year: 2016

  19. The Application of Text Mining in Business Research

    DEFF Research Database (Denmark)

    Preuss, Bjørn

    2017-01-01

    The aim of this paper is to present a methodological concept in business research that has the potential to become one of the most powerful methods in the upcoming years when it comes to research qualitative phenomena in business and society. It presents a selection of algorithms as well elaborates...

  20. Analyzing asset management data using data and text mining.

    Science.gov (United States)

    2014-07-01

    Predictive models using text from a sample competitively bid California highway projects have been used to predict a construction : projects likely level of cost overrun. A text description of the project and the text of the five largest project line...

  1. PaperBLAST: Text Mining Papers for Information about Homologs

    International Nuclear Information System (INIS)

    Price, Morgan N.; Arkin, Adam P.

    2017-01-01

    Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions.

  2. Text Mining in Python through the HTRC Feature Reader

    Directory of Open Access Journals (Sweden)

    Peter Organisciak

    2016-11-01

    Full Text Available We introduce a toolkit for working with the 13.6 million volume Extracted Features Dataset from the HathiTrust Research Center. You will learn how to peer at the words and trends of any book in the collection, while developing broadly useful Python data analysis skills. The HathiTrust holds nearly 15 million digitized volumes from libraries around the world. In addition to their individual value, these works in aggregate are extremely valuable for historians. Spanning many centuries and genres, they offer a way to learn about large-scale trends in history and culture, as well as evidence for changes in language or even the structure of the book. To simplify access to this collection the HathiTrust Research Center (HTRC has released the Extracted Features dataset (Capitanu et al. 2015: a dataset that provides quantitative information describing every page of every volume in the collection. In this lesson, we introduce the HTRC Feature Reader, a library for working with the HTRC Extracted Features dataset using the Python programming language. The HTRC Feature Reader is structured to support work using popular data science libraries, particularly Pandas. Pandas provides simple structures for holding data and powerful ways to interact with it. The HTRC Feature Reader uses these data structures, so learning how to use it will also cover general data analysis skills in Python.

  3. Discovering the Language of Wine Reviews : A Text Mining Account

    NARCIS (Netherlands)

    Lefever, Els; Hendrickx, Iris; Croijmans, Ilja; van den Bosch, A.; Majid, Asifa

    It is widely held that smells and flavors are impossible to put into words. In this paper we test this claim by seeking predictive patterns in wine reviews, which ostensibly aim to provide guides to perceptual content. Wine reviews have previously been critiqued as random and meaningless. We

  4. Roles for text mining in protein function prediction.

    Science.gov (United States)

    Verspoor, Karin M

    2014-01-01

    The Human Genome Project has provided science with a hugely valuable resource: the blueprints for life; the specification of all of the genes that make up a human. While the genes have all been identified and deciphered, it is proteins that are the workhorses of the human body: they are essential to virtually all cell functions and are the primary mechanism through which biological function is carried out. Hence in order to fully understand what happens at a molecular level in biological organisms, and eventually to enable development of treatments for diseases where some aspect of a biological system goes awry, we must understand the functions of proteins. However, experimental characterization of protein function cannot scale to the vast amount of DNA sequence data now available. Computational protein function prediction has therefore emerged as a problem at the forefront of modern biology (Radivojac et al., Nat Methods 10(13):221-227, 2013).Within the varied approaches to computational protein function prediction that have been explored, there are several that make use of biomedical literature mining. These methods take advantage of information in the published literature to associate specific proteins with specific protein functions. In this chapter, we introduce two main strategies for doing this: association of function terms, represented as Gene Ontology terms (Ashburner et al., Nat Genet 25(1):25-29, 2000), to proteins based on information in published articles, and a paradigm called LEAP-FS (Literature-Enhanced Automated Prediction of Functional Sites) in which literature mining is used to validate the predictions of an orthogonal computational protein function prediction method.

  5. Assessment of regulatory effectiveness. Peer discussions on regulatory practices

    International Nuclear Information System (INIS)

    1999-09-01

    regulatory approaches of the regulatory body and its organization are important factors. Whilst regulatory effectiveness cannot easily be measured directly, there are various characteristics which can be attributed to an effective regulatory body. These characteristics can be used as indicators. They can also provide guidance on the assessment of regulatory effectiveness. They may also indicate possible fields of enhancement of the effectiveness of a regulatory body. In order to assist Member States in achieving and maintaining a high level of regulatory effectiveness, the IAEA convened the seventh series of peer discussions on 'Assessment of Regulatory Effectiveness'. The results and findings of these discussions are summarized in this report which concentrates on common findings and good practices identified during the discussions. Its intention is primarily to disseminate information on existing experience and to identify beneficial aspects of practices in order to provide guidance to Member States. This report is structured so that it covers the subject matter under the following main headings: Elements of an Effective Regulatory Body; Possible Indicators of Regulatory Effectiveness; Assessment and Suggestions for Good Practices to Enhance Effectiveness. It is important to note that recommendations of good practice are included if they have been identified by at least one of the groups. It does not follow that all of the groups or individual Member States would necessarily endorse all of the recommendations. However, it is considered that if a single group of senior regulators judge that a particular practice is worthy of recommendation, it needs to receive serious consideration. In some cases the same recommendations arise from all of the groups. These are considered to be particularly meritorious

  6. Regulatory Anatomy

    DEFF Research Database (Denmark)

    Hoeyer, Klaus

    2015-01-01

    This article proposes the term “safety logics” to understand attempts within the European Union (EU) to harmonize member state legislation to ensure a safe and stable supply of human biological material for transplants and transfusions. With safety logics, I refer to assemblages of discourses, le...... they arise. In short, I expose the regulatory anatomy of the policy landscape....

  7. Regulatory Governance

    DEFF Research Database (Denmark)

    Kjær, Poul F.; Vetterlein, Antje

    2018-01-01

    Regulatory governance frameworks have become essential building blocks of world society. From supply chains to the regimes surrounding international organizations, extensive governance frameworks have emerged which structure and channel a variety of social exchanges, including economic, political...... by the International Transitional Administrations (ITAs) in Kosovo and Iraq as well as global supply chains and their impact on the garment industry in Bangladesh....

  8. Minería de textos: la nueva generación de análisis de literatura científica en biología molecular y genómica Text-mining: the new generation of scientific literature analysis in molecular biology and genomics

    Directory of Open Access Journals (Sweden)

    Carmen Gálvez

    2008-01-01

    Full Text Available Una vez descifrado la secuencia del genoma humano, el paradigma de investigación ha cambiado dando paso a la descripción de las funciones de los genes y a futuros avances en la lucha contra enfermedades. Este nuevo contexto ha despertado el interés de la Bioinformática, que combina métodos de las Ciencias de la Vida con las Ciencias de la Información haciendo posible el acceso a la gran cantidad de información biológica almacenada en las bases de datos, y de la Genómica, dedicada al estudio de las interacciones de los genes y su influencia en el desarrollo de enfermedades. En este contexto, la minería de textos surge como un instrumento emergente para el análisis de la literatura científica. Una tarea habitual de la minería de textos en Biología Molecular y Genómica es el reconocimiento de entidades biológicas, tales como genes, proteínas y enfermedades. El paso siguiente en el proceso de minería lo constituye la dentificación entre entidades biológicas, tales como el tipo de interacción entre gen-gen, gen-enfermedad, gen-proteína, para interpretar funciones biológicas, o formular hipótesis de investigación. El objetivo de este trabajo es examinar el auge y las limitaciones la nueva generación de herramientas de análisis de la información en lenguaje natural, almacenada en bases de datos bibliográficas, como PubMed o MEDLINE.Since human genome sequences were first decoded, the paradigm of investigation has changed leading to the description of the functions of the genes and to future advances in the fight against diseases. This new context has awoke the interest of the Bioinformatics, that combines methods of the Life Science with the Information Sciences, making the access to the great quantity of biological information stored in the databases, and of the Genomics, dedicated to the study of the interactions of the genes and its influence in the development of diseases. In this context, the text mining arises like an

  9. Strengthening Regulatory Competence in Pakistan

    International Nuclear Information System (INIS)

    Sadiq, M.

    2016-01-01

    Capacity building of Pakistan Nuclear Regulatory Authority is considered an essential element in pursuit of its vision to become a world class regulatory body. Since its inception in 2001, PNRA has continuously endeavoured to invest in its people, develop training infrastructure and impart sound knowledge and professional skills with the aim to improve its regulatory effectiveness. The use of nuclear and radioactive material in Pakistan has increased manifold in recent years, thus induction of more manpower was needed for regulatory oversight. PNRA adopted two pronged approach for meeting the manpower demand (a) employment of university graduates through fast track recruitment drive and (b) induction of graduates by offering fellowships for Master degree programs. Although, the newly employed staff was selected on the basis of their excellent academic qualifications in basic and applied sciences, but they required rigorous knowledge and skills in regulatory perspectives. In order to implement a structured training program, PNRA conducted Training Needs Assessment (TNA) and identified competency gaps of the regulatory staff in legal, technical, regulatory practice and behavioural domains. PNRA took several initiatives for capacity building which included establishment of a training centre for sustainability of trainings, initiation of a fellowship scheme for Master program, attachment of staff at local institutes for on-the-job training and placement at foreign regulatory bodies and organizations for technical development with the assistance of IAEA. The above strategies have been very beneficial in competence building of the PNRA staff to perform all regulatory activities indigenously for nuclear power plants, research reactors and radiation facilities. Provision of vibrant technical support to IAEA and Member States in various programs by PNRA is a landmark of these competence development efforts. This paper summarizes PNRA initiatives and the International Atomic

  10. 30 CFR 795.11 - Assistance funding.

    Science.gov (United States)

    2010-07-01

    ... Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR SMALL OPERATOR ASSISTANCE PERMANENT REGULATORY PROGRAM-SMALL OPERATOR ASSISTANCE PROGRAM § 795.11 Assistance... eligible small operators if available funds are less than those required to provide the services pursuant...

  11. Regulatory authority information system RAIS

    International Nuclear Information System (INIS)

    Ortiz, P.; Mrabit, K.; Miaw, S.

    2000-01-01

    In this lecture the principles of the regulatory authority information system (RAIS) are presented. RAIS is a tool currently being developed by the IAEA for the Regulatory Authorities. It is a part of a set of supporting actions designed to assist member states in achieving the objectives of the Model project on radiation and waste safety infrastructure. RAIS is a tool that provides the management of the Regulatory Authority with the key information needed for the planning and implementation of activities and to ensure confidence that resources are optimally used. The RAIS contains five modules: Inventory of installations and radiation sources; Authorization process; Inspection and follow-up actions; Information on personal dosimetry; Assessment of effectiveness by means of performance indicators

  12. Regulatory Physiology

    Science.gov (United States)

    Lane, Helen W.; Whitson, Peggy A.; Putcha, Lakshmi; Baker, Ellen; Smith, Scott M.; Stewart, Karen; Gretebeck, Randall; Nimmagudda, R. R.; Schoeller, Dale A.; Davis-Street, Janis

    1999-01-01

    As noted elsewhere in this report, a central goal of the Extended Duration Orbiter Medical Project (EDOMP) was to ensure that cardiovascular and muscle function were adequate to perform an emergency egress after 16 days of spaceflight. The goals of the Regulatory Physiology component of the EDOMP were to identify and subsequently ameliorate those biochemical and nutritional factors that deplete physiological reserves or increase risk for disease, and to facilitate the development of effective muscle, exercise, and cardiovascular countermeasures. The component investigations designed to meet these goals focused on biochemical and physiological aspects of nutrition and metabolism, the risk of renal (kidney) stone formation, gastrointestinal function, and sleep in space. Investigations involved both ground-based protocols to validate proposed methods and flight studies to test those methods. Two hardware tests were also completed.

  13. Regulatory Benchmarking

    DEFF Research Database (Denmark)

    Agrell, Per J.; Bogetoft, Peter

    2017-01-01

    Benchmarking methods, and in particular Data Envelopment Analysis (DEA), have become well-established and informative tools for economic regulation. DEA is now routinely used by European regulators to set reasonable revenue caps for energy transmission and distribution system operators. The appli......Benchmarking methods, and in particular Data Envelopment Analysis (DEA), have become well-established and informative tools for economic regulation. DEA is now routinely used by European regulators to set reasonable revenue caps for energy transmission and distribution system operators....... The application of bench-marking in regulation, however, requires specific steps in terms of data validation, model specification and outlier detection that are not systematically documented in open publications, leading to discussions about regulatory stability and economic feasibility of these techniques...

  14. Regulatory Benchmarking

    DEFF Research Database (Denmark)

    Agrell, Per J.; Bogetoft, Peter

    2017-01-01

    Benchmarking methods, and in particular Data Envelopment Analysis (DEA), have become well-established and informative tools for economic regulation. DEA is now routinely used by European regulators to set reasonable revenue caps for energy transmission and distribution system operators. The appli......Benchmarking methods, and in particular Data Envelopment Analysis (DEA), have become well-established and informative tools for economic regulation. DEA is now routinely used by European regulators to set reasonable revenue caps for energy transmission and distribution system operators....... The application of benchmarking in regulation, however, requires specific steps in terms of data validation, model specification and outlier detection that are not systematically documented in open publications, leading to discussions about regulatory stability and economic feasibility of these techniques...

  15. Regulatory Assistance, Stakeholder Outreach, and Coastal and Marine Spatial Planning Activities In Support Marine and Hydrokinetic Energy Deployment: Task 2.1.7 Permitting and Planning Fiscal Year 2012 Year-End Report

    Energy Technology Data Exchange (ETDEWEB)

    Geerlofs, Simon H. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Hanna, Luke A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Judd, Chaeli R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Blake, Kara M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2012-09-01

    This fiscal year 2012 year-end report summarizes activities carried out under DOE Water Power task 2.1.7, Permitting and Planning. Activities under Task 2.1.7 address the concerns of a wide range of stakeholders with an interest in the development of the MHK industry, including regulatory and resource management agencies, tribes, NGOs, and industry.

  16. Application of Resource Portfolio Concept in Nuclear Regulatory Infrastructure Support

    International Nuclear Information System (INIS)

    Lee, Y. E.; Ha, J. T.; Chang, H. S.; Kam, S. C.; Ryu, Y. H.

    2010-01-01

    As the new entrants in the global nuclear construction market are increasing and the establishment of an effective and sustainable regulatory infrastructure becomes more important, they have requested international assistance from the international nuclear communities with mature nuclear regulatory programmes. It needs to optimize the use of limited resources from regulatory organization providing support to regulatory infrastructure of new comers. This paper suggests the resource portfolio concept like a GE/Mckinsey Matrix used in business management and tries to apply it to the current needs considered in the regulatory support program in Korea as the case study

  17. Regulatory Control of Radiation Sources. Safety Guide (Arabic Edition)

    International Nuclear Information System (INIS)

    2012-01-01

    This Safety Guide is intended to assist States in implementing the requirements established in Safety Standards Series No. GS-R-1, Legal and Governmental Infrastructure for Nuclear, Radiation, Radioactive Waste and Transport Safety, for a national regulatory infrastructure to regulate any practice involving radiation sources in medicine, industry, research, agriculture and education. The Safety Guide provides advice on the legislative basis for establishing regulatory bodies, including the effective independence of the regulatory body. It also provides guidance on implementing the functions and activities of regulatory bodies: the development of regulations and guides on radiation safety; implementation of a system for notification and authorization; carrying out regulatory inspections; taking necessary enforcement actions; and investigating accidents and circumstances potentially giving rise to accidents. The various aspects relating to the regulatory control of consumer products are explained, including justification, optimization of exposure, safety assessment and authorization. Guidance is also provided on the organization and staffing of regulatory bodies. Contents: 1. Introduction; 2. Legal framework for a regulatory infrastructure; 3. Principal functions and activities of the regulatory body; 4. Regulatory control of the supply of consumer products; 5. Functions of the regulatory body shared with other governmental agencies; 6. Organization and staffing of the regulatory body; 7. Documentation of the functions and activities of the regulatory body; 8. Support services; 9. Quality management for the regulatory system.

  18. Assistive Technology

    Science.gov (United States)

    ... Page Resize Text Printer Friendly Online Chat Assistive Technology Assistive technology (AT) is any service or tool that helps ... be difficult or impossible. For older adults, such technology may be a walker to improve mobility or ...

  19. NRC regulatory information conference: Proceedings

    International Nuclear Information System (INIS)

    1989-09-01

    This volume of the report provides the proceedings from the Nuclear Regulatory Commission (NRC) Regulatory Information Conference that was held at the Mayflower Hotel, Washington, DC, on April 18, 19, and 20, 1989. This conference was held by the NRC and chaired by Dr. Thomas E. Mosley, Director, Office of Nuclear Reactor Regulations (NRR) and coordinated by S. Singh Bajwa, Chief, Technical Assistance Management Section, NRR. There were approximately 550 participants from nine countries at the conference. The countries represented were Canada, England, Italy, Japan, Mexico, Spain, Taiwan, Yugoslavia, and the United States. The NRC staff discussed with nuclear industry its regulatory philosophy and approach and the bases on which they have been established. Furthermore, the NRC staff discussed several initiatives that have been implemented recently and their bases as well as NRC's expectations for new initiatives to further improve safety. The figures contained in Appendix A to the volume correspond to the slides that were shown during the presentations. Volume 2 of this report contains the formal papers that were distributed at the beginning of the Regulatory Information Conference and other information about the conference

  20. Environment, safety, and health regulatory implementation plan

    International Nuclear Information System (INIS)

    1993-01-01

    To identify, document, and maintain the Uranium Mill Tailings Remedial Action (UMTRA) Project's environment, safety, and health (ES ampersand H) regulatory requirements, the US Department of Energy (DOE) UMTRA Project Office tasked the Technical Assistance Contractor (TAC) to develop a regulatory operating envelope for the UMTRA Project. The system selected for managing the UMTRA regulatory operating envelope data bass is based on the Integrated Project Control/Regulatory Compliance System (IPC/RCS) developed by WASTREN, Inc. (WASTREN, 1993). The IPC/RCS is a tool used for identifying regulatory and institutional requirements and indexing them to hardware, personnel, and program systems on a project. The IPC/RCS will be customized for the UMTRA Project surface remedial action and groundwater restoration programs. The purpose of this plan is to establish the process for implementing and maintaining the UMTRA Project's regulatory operating envelope, which involves identifying all applicable regulatory and institutional requirements and determining compliance status. The plan describes how the Project will identify ES ampersand H regulatory requirements, analyze applicability to the UMTRA Project, and evaluate UMTRA Project compliance status

  1. Assisted Living

    Science.gov (United States)

    ... it, too. Back to top What is the Cost for Assisted Living? Although assisted living costs less than nursing home ... Primarily, older persons or their families pay the cost of assisted living. Some health and long-term care insurance policies ...

  2. Regulatory aspects of NPP safety

    International Nuclear Information System (INIS)

    Kastchiev, G.

    1999-01-01

    Extensive review of the NPP Safety is presented including tasks of Ministry of Health, Ministry of Internal Affairs, Ministry of Environment and Waters, Ministry of Defense in the field of national system for monitoring the nuclear power. In the frame of national nuclear safety legislation Bulgaria is in the process of approximation of the national legislation to that of EC. Detailed analysis of the status of regulatory body, its functions, organisation structure, responsibilities and future tasks is included. Basis for establishing the system of regulatory inspections and safety enforcement as well as intensification of inspections is described. Assessment of safety modifications is concerned with complex program for reconstruction of Units 1-4 of Kozloduy NPP, as well as for modernisation of Units 5 and 6. Qualification and licensing of the NPP personnel, Year 2000 problem, priorities and the need of international assistance are mentioned

  3. Navigating "Assisted Dying".

    Science.gov (United States)

    Schipper, Harvey

    2016-02-01

    Carter is a bellwether decision, an adjudication on a narrow point of law whose implications are vast across society, and whose impact may not be realized for years. Coupled with Quebec's Act Respecting End-of-life Care it has sharply changed the legal landscape with respect to actively ending a person's life. "Medically assisted dying" will be permitted under circumstances, and through processes, which have yet to be operationally defined. This decision carries with it moral assumptions, which mean that it will be difficult to reach a unifying consensus. For some, the decision and Act reflect a modern acknowledgement of individual autonomy. For others, allowing such acts is morally unspeakable. Having opened the Pandora's Box, the question becomes one of navigating a tolerable societal path. I believe it is possible to achieve a workable solution based on the core principle that "medically assisted dying" should be a very rarely employed last option, subject to transparent ongoing review, specifically as to why it was deemed necessary. My analysis is based on 1. The societal conditions in which have fostered demand for "assisted dying", 2. Actions in other jurisdictions, 3. Carter and Quebec Bill 52, 4. Political considerations, 5. Current medical practice. Leading to a series of recommendations regarding. 1. Legislation and regulation, 2. The role of professional regulatory agencies, 3. Medical professions education and practice, 4. Public education, 5. Health care delivery and palliative care. Given the burden of public opinion, and the legal steps already taken, a process for assisted-dying is required. However, those legal and regulatory steps should only be considered a necessary and defensive first step in a two stage process. The larger goal, the second step, is to drive the improvement of care, and thus minimize assisted-dying.

  4. As to achieve regulatory action, regulatory approaches

    International Nuclear Information System (INIS)

    Cid, R.; Encinas, D.

    2014-01-01

    The achievement of the effectiveness in the performance of a nuclear regulatory body has been a permanent challenge in the recent history of nuclear regulation. In the post-Fukushima era this challenge is even more important. This article addresses the subject from two complementary points of view: the characteristics of an effective regulatory body and the regulatory approaches. This work is based on the most recent studies carried out by the Committee on Nuclear Regulatory Activities, CNRA (OECD/NEA), as well as on the experience of the Consejo de Seguridad Nuclear, CSN, the Spanish regulatory body. Rafael Cid is the representative of CSN in these project: Diego Encinas has participated in the study on regulatory approaches. (Author)

  5. Professional and Regulatory Search

    Science.gov (United States)

    Professional and Regulatory search are designed for people who use EPA web resources to do their job. You will be searching collections where information that is not relevant to Environmental and Regulatory professionals.

  6. 13 CFR 108.585 - Voluntary decrease in NMVC Company's Regulatory Capital.

    Science.gov (United States)

    2010-01-01

    ...'s Regulatory Capital. 108.585 Section 108.585 Business Credit and Assistance SMALL BUSINESS ADMINISTRATION NEW MARKETS VENTURE CAPITAL (âNMVCâ) PROGRAM Managing the Operations of a NMVC Company Voluntary Decrease in Regulatory Capital § 108.585 Voluntary decrease in NMVC Company's Regulatory Capital. You must...

  7. 10 CFR 26.35 - Employee assistance programs.

    Science.gov (United States)

    2010-01-01

    ... 10 Energy 1 2010-01-01 2010-01-01 false Employee assistance programs. 26.35 Section 26.35 Energy NUCLEAR REGULATORY COMMISSION FITNESS FOR DUTY PROGRAMS Program Elements § 26.35 Employee assistance... to safely and competently perform their duties. Employee assistance programs must be designed to...

  8. Establishing exemption and clearance criteria by the regulatory authority

    International Nuclear Information System (INIS)

    Salih, A.E.A.

    2012-04-01

    This Project work discusses the relationship between the concepts of exemption and clearance, and their practical use in the overall scheme of regulatory control of practices. It also discusses how exemptions and clearance is established and the scope of its applications for regulatory control. The concept of general clearance levels for any type of material and any possible pathway of disposal is also introduced in this work. Guidance of the Group of Experts establishing scenarios for general clearance, parameter values, and a nuclide-specific list of calculated clearance levels is also presented. Regulatory authorities are required to develop guidance on exemption and clearance levels to assist licensees and registrants to know which practices and sources within practices are exempted from regulatory control and those to be cleared from further controls. Exemption and clearance levels are tools for assisting the Regulatory Authority to optimize the use of resources. (author)

  9. Future nuclear regulatory challenges

    International Nuclear Information System (INIS)

    Royen, J.

    1998-01-01

    In December 1996, the NEA Committee on Nuclear Regulatory Activities concluded that changes resulting from economic deregulation and other recent developments affecting nuclear power programmes have consequences both for licensees and regulatory authorities. A number of potential problems and issues which will present a challenge to nuclear regulatory bodies over the next ten years have been identified in a report just released. (author)

  10. Competent authority regulatory control of the transport of radioactive material

    International Nuclear Information System (INIS)

    1987-04-01

    The purpose of this guide is to assist competent authorities in regulating the transport of radioactive materials and to assist users of transport regulations in their interactions with competent authorities. The guide should assist specifically those countries which are establishing their regulatory framework and further assist countries with established procedures to harmonize their application and implementation of the IAEA Regulations. This guide specifically covers various aspects of the competent authority implementation of the IAEA Regulations for the Safe Transport of Radioactive Material. In addition, physical protection and safeguards control of the transport of nuclear materials as well as third party liability aspects are briefly discussed. This is because they have to be taken into account in overall transport regulatory activities, especially when establishing the regulatory framework

  11. Regulatory activities; Actividades regulatorias

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2001-07-01

    This publication, compiled in 8 chapters, presents the regulatory system developed by the Nuclear Regulatory Authority (NRA) of the Argentine Republic. The following activities and developed topics in this document describe: the evolution of the nuclear regulatory activity in Argentina; the Argentine regulatory system; the nuclear regulatory laws and standards; the inspection and safeguards of nuclear facilities; the emergency systems; the environmental systems; the environmental monitoring; the analysis laboratories on physical and biological dosimetry, prenatal irradiation, internal irradiation, radiation measurements, detection techniques on nuclear testing, medical program on radiation protection; the institutional relations with national and international organization; the training courses and meeting; the technical information.

  12. International regulatory activities

    International Nuclear Information System (INIS)

    Anon.

    2004-01-01

    The 48. session of the IAEA general conference was held in Vienna from 20 to 24 september 2004 with the participation of delegates from 125 members states and representatives of various international organisations. A number of resolutions were adopted by the conference in the following fields: nuclear safety, radiation, transport and waste safety. The general conference also adopted a resolution on measures to protect against nuclear terrorism. The Director General decided in 2003 to appoint a group of experts to explore and advise on issues related to nuclear liability. This group called the International Expert Group on Nuclear Liability (I.N.L.E.X.) consists of 20 experts members from nuclear power and non nuclear power countries and from shipping and non shipping states. It serves three major functions: to create a forum of expertise to explore and advise on issues related to nuclear liability; to enhance global adherence by nuclear and non nuclear states to an effective nuclear liability regime, inter alia, on the basis of the convention on supplementary compensation for nuclear damage and the annex thereto, the Vienna convention on civil liability for nuclear damage, the Paris convention on third party liability in the field of nuclear energy, the joint protocol relating to the application of the vienna convention and the paris convention and the amendments thereto; and to assist in the development and strengthening of the national nuclear liability legal frameworks in IAEA members states to protect the public and the environment and to enhance nuclear safety. The second part of international regulatory concerns a directive on public access to environmental information made by the European Parliament. (N.C.)

  13. Regulatory reform in Mexico's natural gas sector

    International Nuclear Information System (INIS)

    1996-01-01

    In recent years Mexico has implemented remarkable structural changes in its economy. However, until recently its large and key energy sector was largely unreformed. This is now changing. In 1995 the Mexican Government introduced legislative changes permitting private sector involvement in natural gas storage, transportation and distribution. Subsequent directives set up a detailed regulatory framework. These developments offer considerable promise, not only for natural gas sector development but also for growth in the closely linked electricity sector. This study analyses the changes which have taken place and the rationale for the regulatory framework which has been established. The study also contains recommendations to assist the Government of Mexico in effectively implementing its natural gas sector reforms and in maximizing the benefits to be realised through the new regulatory framework. (author)

  14. Research and regulatory review

    International Nuclear Information System (INIS)

    Macleod, J.S.; Fryer, D.R.H.

    1979-01-01

    To enable the regulatory review to be effectively undertaken by the regulatory body, there is a need for it to have ready access to information generated by research activities. Certain advantages have been seen to be gained by the regulatory body itself directly allocating and controlling some portion of these activities. The princial reasons for reaching this conclusion are summarised and a brief description of the Inspectorates directly sponsored programme outlined. (author)

  15. Regulatory Commission of Alaska

    Science.gov (United States)

    Map Help Regulatory Commission of Alaska Login Forgot Password Arrow Image Forgot password? View Cart login Procedures for Requesting Login For Consumers General Information Telephone Electric Natural Gas

  16. NRC Regulatory Agenda

    International Nuclear Information System (INIS)

    1991-10-01

    The NRC Regulatory Agenda is a compilation of all rules on which the NRC has recently completed action, or has proposed action, or is considering action, and all petitions for rulemaking which have been received by the Commission and are pending disposition by the Commission. The Regulatory Agenda is updated and issued each quarter

  17. NRC regulatory agenda

    International Nuclear Information System (INIS)

    1993-04-01

    The NRC Regulatory Agenda is a compilation of all rules on which the NRC has recently completed action, or has proposed action, or is considering action, and all petitions for rulemaking which have been received by the Commission and are pending disposition by the Commission. The Regulatory Agenda is updated and issued each quarter

  18. NRC regulatory agenda

    International Nuclear Information System (INIS)

    1990-01-01

    The NRC Regulatory Agenda is a compilation of all rules on which the NRC has proposed or is considering action and all petitions for rulemaking which have been received by the Commission and are pending disposition by the Commission. The Regulatory Agenda is updated and issued each quarter

  19. NRC regulatory agenda

    International Nuclear Information System (INIS)

    1991-04-01

    The NRC Regulatory Agenda is a compilation of all rules on which the NRC has recently completed action or has proposed, or is considering action and all petitions for rulemaking which have been received by the Commission and are pending disposition by the Commission. The Regulatory Agenda is updated and issued each quarter

  20. NRC Regulatory Agenda

    International Nuclear Information System (INIS)

    1991-08-01

    The NRC Regulatory Agenda is a compilation of all rules on which the NRC has recently completed action or has proposed, or is considering action and all petitions for rulemaking which have been received by the commission and are pending disposition by the Commission. The Regulatory Agenda is updated and issued each quarter

  1. Trust in regulatory regimes

    NARCIS (Netherlands)

    Six, Frédérique; Verhoest, Koen

    2017-01-01

    Within political and administrative sciences generally, trust as a concept is contested, especially in the field of regulatory governance. This groundbreaking book is the first to systematically explore the role and dynamics of trust within regulatory regimes. Conceptualizing, mapping and analyzing

  2. Improving nuclear regulatory effectiveness

    International Nuclear Information System (INIS)

    2001-01-01

    Ensuring that nuclear installations are operated and maintained in such a way that their impact on public health and safety is as low as reasonably practicable has been and will continue to be the cornerstone of nuclear regulation. In the past, nuclear incidents provided the main impetus for regulatory change. Today, economic factors, deregulation, technological advancements, government oversight and the general requirements for openness and accountability are leading regulatory bodies to review their effectiveness. In addition, seeking to enhance the present level of nuclear safety by continuously improving the effectiveness of regulatory bodies is seen as one of the ways to strengthen public confidence in the regulatory systems. This report covers the basic concepts underlying nuclear regulatory effectiveness, advances being made and future requirements. The intended audience is primarily nuclear safety regulators, but government authorities, nuclear power plant operators and the general public may also be interested. (author)

  3. Training assessments and assistance

    International Nuclear Information System (INIS)

    Przybylski, J.L.

    1994-07-01

    The Transportation Management Division, Office of Environmental Restoration and Waste Management (TMD/EM-261), United States Department of Energy (DOE), Training Program Manager has established an independent Training Assessment Program, the intent of which is to evaluate, exclusively, transportation and packaging training activities throughout the Department of Energy (DOE) community. The results generated from an application of the Training Assessment Program are intended to be utilized as a management tool for maintaining compliance with applicable regulatory-driven training requirements. In addition, the Transportation Assessment Program can be employed to evaluate training methodologies and, through a pre-arranged, cooperative, technical assistance effort, provide each Department of Energy (DOE) site with the means necessary to enhance it's overall transportation and packaging training capabilities

  4. Assisted Living

    Science.gov (United States)

    ... a resident's needs depends as much on the philosophy and services of the assisted living facility as it does on the quality of care. The Administration on Aging, a part of the U.S. Department of Health and Human Services (HHS), offers these suggestions to help you ...

  5. Assistive Devices

    Science.gov (United States)

    If you have a disability or injury, you may use a number of assistive devices. These are tools, products or types of equipment that help you perform tasks and activities. They may help you move around, see, communicate, eat, or get ...

  6. Regulação assistencial no recife: possibilidades e limites na promoção do acesso Assistance Regulatory in Recife: possibilities and limits in promoting access

    Directory of Open Access Journals (Sweden)

    Maria do Socorro Veloso de Albuquerque

    2013-03-01

    the point of view of access with equity and integrality. A case study was conducted, having the managers from Municipal Health Department of Recife as its subjects. Content analysis was used in a thematic typology, taking as basis the concepts of government triangle, accessibility and network. It was seen that although the municipality implemented organizational arrangements for care regulation, it has neither invested in the regulation of its own specialized services, nor analyzed the potential capacity of these services. The Consultation and Specialized Exams Regulation Service absorbed only 9.5% of the procedures of medium complexity under municipal management. Moreover, little was invested in the expansion of the solvability of primary care, which contributed to keep a possibly artificial demand for specialized services. The possibility of greater organizational accessibility through the regulation of care was reduced to a process of organizing the existing demand to the capability of supply of SUS' supplementary (private network in Recife. In some cases, this was defined by the interests of the very private sector, over which the regulation implemented by the municipal administration had reduced power of definition. The centralizing action of the Municipal Health Department hindered the creation of a shared regulatory complex between different levels of management. It is concluded that the regulation of care in the sphere of the municipalities can hardly promote full and equal access if it acts only over the services under municipal management, if it does not form regional networks of care and agreements between public entities for continued care and if it acts without subordinating private interests to the welfare needs of the population.

  7. Environmental Regulatory Update Table, May/June 1992

    Energy Technology Data Exchange (ETDEWEB)

    Houlberg, L.M.; Hawkins, G.T.; Lewis, E.B.; Salk, M.S.

    1992-07-01

    This report contains a bi-monthly update of environmental regulatory activity that is of interest to the Department of Energy. It is provided to DOE operations and contractor staff to assist and support environmental management programs by tracking regulatory developments. Any proposed regulation that raises significant issues for any DOE operation should be reported to the Office of Environmental Guidance (EH-23) as soon as possible so that the Department can make its concerns known to the appropriate regulatory agency. Items of particular interest to EH-23 are indicated by a shading of the RU{number sign}.

  8. Environmental Regulatory Update Table, May/June 1992

    Energy Technology Data Exchange (ETDEWEB)

    Houlberg, L.M.; Hawkins, G.T.; Lewis, E.B.; Salk, M.S.

    1992-07-01

    This report contains a bi-monthly update of environmental regulatory activity that is of interest to the Department of Energy. It is provided to DOE operations and contractor staff to assist and support environmental management programs by tracking regulatory developments. Any proposed regulation that raises significant issues for any DOE operation should be reported to the Office of Environmental Guidance (EH-23) as soon as possible so that the Department can make its concerns known to the appropriate regulatory agency. Items of particular interest to EH-23 are indicated by a shading of the RU{number_sign}.

  9. Regulatory guide in support of ECCS rule revision

    International Nuclear Information System (INIS)

    Tovmassian, H.S.

    1987-01-01

    The US Nuclear Regulatory Commission staff is proposing to amend 10 CFR 50.46 and Appendix K to allow licensees to use best estimate calculations to estimate emergency core cooling system performance. This estimate in conjunction with an estimate of the uncertainty in the calculation would then be used to assure that the licensing limits set forth in 10 CFR 50.46(b) are not exceeded. The NRC staff has prepared a draft regulatory guide to assist licensees and applicants in complying with these proposed amendments. This paper sets forth the objectives of this regulatory guide, the approach taken, the difficulties encountered, and the current status of this effort

  10. Regulatory guidance document

    International Nuclear Information System (INIS)

    1994-05-01

    The Office of Civilian Radioactive Waste Management (OCRWM) Program Management System Manual requires preparation of the OCRWM Regulatory Guidance Document (RGD) that addresses licensing, environmental compliance, and safety and health compliance. The document provides: regulatory compliance policy; guidance to OCRWM organizational elements to ensure a consistent approach when complying with regulatory requirements; strategies to achieve policy objectives; organizational responsibilities for regulatory compliance; guidance with regard to Program compliance oversight; and guidance on the contents of a project-level Regulatory Compliance Plan. The scope of the RGD includes site suitability evaluation, licensing, environmental compliance, and safety and health compliance, in accordance with the direction provided by Section 4.6.3 of the PMS Manual. Site suitability evaluation and regulatory compliance during site characterization are significant activities, particularly with regard to the YW MSA. OCRWM's evaluation of whether the Yucca Mountain site is suitable for repository development must precede its submittal of a license application to the Nuclear Regulatory Commission (NRC). Accordingly, site suitability evaluation is discussed in Chapter 4, and the general statements of policy regarding site suitability evaluation are discussed in Section 2.1. Although much of the data and analyses may initially be similar, the licensing process is discussed separately in Chapter 5. Environmental compliance is discussed in Chapter 6. Safety and Health compliance is discussed in Chapter 7

  11. 5 CFR 340.203 - Technical assistance.

    Science.gov (United States)

    2010-01-01

    ... part-time employment practices; (3) Development of special recruitment and selection techniques for...-TIME CAREER EMPLOYMENT (PART-TIME, SEASONAL, ON-CALL, AND INTERMITTENT) Regulatory Requirements-Part-Time Employment § 340.203 Technical assistance. (a) The Office of Personnel Management shall provide...

  12. Foreign assistance

    International Nuclear Information System (INIS)

    1991-07-01

    This paper reports that providing energy assistance to developing countries remains a relatively low priority of the Agency for International Development. AID is helping some developing countries meet their energy needs, but this assistance varies substantially because of the agency's decentralized structure. Most AID energy funding has gone to a handful of countries-primarily Egypt and Pakistan. With limited funding in most other countries, AID concentrates on providing technical expertise and promoting energy policy reforms that will encourage both energy efficiency and leverage investment by the private sector and other donors. Although a 1989 congressional directive to pursue a global warming initiative has had a marginal impact on the agency's energy programming, many AID energy programs, including those directed at energy conservation, help address global warming concerns

  13. NRC regulatory agenda

    International Nuclear Information System (INIS)

    1990-10-01

    The Regulatory Agenda is a quarterly compilation of all rules on which the NRC has recently completed action or has proposed, or is considering action and of all petitions for rulemaking that the NRC has received that are pending disposition

  14. NRC regulatory agenda

    International Nuclear Information System (INIS)

    1990-04-01

    The Regulatory Agenda is a quarterly compilation of all rules on which the NRC has recently completed action or has proposed, or is considering action and of all petitions for rulemaking that the NRC has received that are pending disposition

  15. Through the regulatory hoop

    International Nuclear Information System (INIS)

    Kirner, N.P.

    1985-01-01

    There are many regulatory hoops through which waste generators, brokers, and disposal site operators must jump to dispose of waste safely. As the proposed exclusionary date of January 1, 1986, approaches, these regulatory hoops have the distinct possibility of multiplying or at least changing shape. The state of Washington, in its role as an Agreement State with the US Nuclear Regulatory Commission, licenses and inspects the commercial operator of the Northwest Compact's low-level radioactive waste disposal site on the Hanford Reservation. Washington has received as much as 53%, or 1.4 million cubic feet per year, of the nation's total volume of waste disposed. To control such a large volume of waste, a regulatory program involving six agencies has developed over the years in Washington

  16. Analysis and evaluation of regulatory uncertainties in 10 CFR 60 subparts B and E

    International Nuclear Information System (INIS)

    Weiner, R.F.; Patrick, W.C.

    1990-01-01

    This paper presents an attribute analysis scheme for prioritizing the resolution of regulatory uncertainties. Attributes are presented which assist in identifying the need for timeliness and durability of the resolution of an uncertainty

  17. Perceptions of regulatory approaches

    International Nuclear Information System (INIS)

    Halin, Magnus; Leinonen, Ruusaliisa

    2012-01-01

    Ms. Ruusaliisa Leinonen and Mr. Magnus Halin from Fortum gave a joint presentation on industry perceptions of regulatory oversight of LMfS/SC. It was concluded that an open culture of discussion exists between the regulator (STUK) and the licensee, based on the common goal of nuclear safety. An example was provided of on how regulatory interventions helped foster improvements to individual and collective dose rate trends, which had remained static. Regulatory interventions included discussions on the ALARA concept to reinforce the requirement to continuously strive for improvements in safety performance. Safety culture has also been built into regulatory inspections in recent years. Training days have also been organised by the regulatory body to help develop a shared understanding of safety culture between licensee and regulatory personnel. Fortum has also developed their own training for managers and supervisors. Training and ongoing discussion on LMfS/SC safety culture is considered particularly important because both Fortum and the regulatory body are experiencing an influx of new staff due to the demographic profile of their organisations. It was noted that further work is needed to reach a common understanding of safety culture on a practical level (e.g., for a mechanic setting to work), and in relation to the inspection criteria used by the regulator. The challenges associated with companies with a mix of energy types were also discussed. This can make it more difficult to understand responsibilities and decision making processes, including the role of the parent body organisation. It also makes communication more challenging due to increased complexity and a larger number of stakeholders

  18. Comparison of ISO 9000 and recent software life cycle standards to nuclear regulatory review guidance

    International Nuclear Information System (INIS)

    Preckshot, G.G.; Scott, J.A.

    1998-01-01

    Lawrence Livermore National Laboratory is assisting the Nuclear Regulatory Commission with the assessment of certain quality and software life cycle standards to determine whether additional guidance for the U.S. nuclear regulatory context should be derived from the standards. This report describes the nature of the standards and compares the guidance of the standards to that of the recently updated Standard Review Plan

  19. Regional and International Networking to Support the Energy Regulatory Commission of Thailand

    Energy Technology Data Exchange (ETDEWEB)

    Lavansiri, Direk; Bull, Trevor

    2010-09-15

    The Energy Regulatory Commission of Thailand is a new regulatory agency. The structure of the energy sector; the tradition of administration; and, the lack of access to experienced personnel in Thailand all pose particular challenges. The Commission is meeting these challenges through regional and international networking to assist in developing policies and procedures that allow it to meet international benchmarks.

  20. Regulatory research program for 1987/88

    International Nuclear Information System (INIS)

    1987-01-01

    The regulatory research program of Canada's Atomic Energy Control Board (AECB) is intended to augment the AECB's research program beyond the capability of in-house resources. The overall objective of the research program is to produce pertinent and independent information that will assist the Board and its staff in making correct, timely and credible decisions on regulating nuclear energy. The program covers the following areas: the safety of nuclear facilities, radioactive waste management, health physics, physical security, and the development of regulatory processes. Sixty-seven projects are planned for 1987/88; as well, there are some projects held in reserve in case funding becomes available. This information bulletin contains a list of the projects with a brief description of each

  1. Nuclear regulatory decision making

    International Nuclear Information System (INIS)

    Wieland, Patricia; Almeida, Ivan Pedro Salati de

    2011-01-01

    The scientific considerations upon which the nuclear regulations are based provide objective criteria for decisions on nuclear safety matters. However, the decisions that a regulatory agency takes go far beyond granting or not an operating license based on assessment of compliance. It may involve decisions about hiring experts or research, appeals, responses to other government agencies, international agreements, etc.. In all cases, top management of the regulatory agency should hear and decide the best balance between the benefits of regulatory action and undue risks and other associated impacts that may arise, including issues of credibility and reputation. The establishment of a decision framework based on well established principles and criteria ensures performance stability and consistency, preventing individual subjectivity. This article analyzes the challenges to the decision-making by regulatory agencies to ensure coherence and consistency in decisions, even in situations where there is uncertainty, lack of reliable information and even divergence of opinions among experts. The article explores the basic elements for a framework for regulatory decision-making. (author)

  2. Nuclear regulatory decision making

    International Nuclear Information System (INIS)

    2005-01-01

    The fundamental objective of all nuclear safety regulatory bodies is to ensure that nuclear utilities operate their plants at all times in an acceptably safe manner. In meeting this objective, the regulatory body should strive to ensure that its regulatory decisions are technically sound, consistent from case to case, and timely. In addition, the regulator must be aware that its decisions and the circumstances surrounding those decisions can affect how its stakeholders, such as government policy makers, the industry it regulates, and the public, view it as an effective and credible regulator. In order to maintain the confidence of those stakeholders, the regulator should make sure that its decisions are transparent, have a clear basis in law and regulations, and are seen by impartial observers to be fair to all parties. Based on the work of a Nuclear Energy Agency (NEA) expert group, this report discusses some of the basic principles and criteria that a regulatory body should consider in making decisions and describes the elements of an integrated framework for regulatory decision making. (author)

  3. Technology assisted training in the nuclear regulatory environment

    Energy Technology Data Exchange (ETDEWEB)

    Martin, D J [Atomic Energy Control Board, Ottawa, ON (Canada)

    1993-11-01

    The mechanics of presenting material can impede or enhance the flow and clarity of information presented during a course. This paper describes briefly how the Training Centre of the Atomic Energy Control Board enhances the effectiveness of courses by using appropriate technology: desktop publishing, video, and computer-based interactive modules. 4 figs.

  4. Prediction of regulatory elements

    DEFF Research Database (Denmark)

    Sandelin, Albin

    2008-01-01

    Finding the regulatory mechanisms responsible for gene expression remains one of the most important challenges for biomedical research. A major focus in cellular biology is to find functional transcription factor binding sites (TFBS) responsible for the regulation of a downstream gene. As wet......-lab methods are time consuming and expensive, it is not realistic to identify TFBS for all uncharacterized genes in the genome by purely experimental means. Computational methods aimed at predicting potential regulatory regions can increase the efficiency of wet-lab experiments significantly. Here, methods...

  5. Rationales for regulatory activity

    Energy Technology Data Exchange (ETDEWEB)

    Perhac, R.M. [Univ. of Tennessee, Knoxville, TN (United States)

    1997-02-01

    The author provides an outline which touches on the types of concerns about risk evaluation which are addressed in the process of establishing regulatory guides. Broadly he says regulatory activity serves three broad constituents: (1) Paternalism (private risk); (2) Promotion of social welfare (public risks); (3) Protection of individual rights (public risks). He then discusses some of the major issues encountered in reaching a decision on what is an acceptable level of risk within each of these areas, and how one establishes such a level.

  6. Developing regulatory approaches

    International Nuclear Information System (INIS)

    Axelsson, Lars

    2012-01-01

    Lars Axelsson presented SSM progress on oversight of LMfS/SC since the Chester 1 Workshop in 2007. Current SSM approaches for safety culture oversight include targeted safety management and safety culture inspections, compliance inspections which cover aspects of safety management/safety culture and multi-disciplinary team inspections. Examples of themes for targeted inspections include management of ambiguous operational situations or other weak signals, understanding of and attitudes to Human Performance tools, the Safety Department's role and authority and Leadership for safety. All regulatory activities provide inputs for the SSM yearly safety evaluation of each licensee. A form has been developed to capture safety culture observations from inspections and other interactions with licensees. Analysis will be performed to identify patterns and provide information to support planning of specific Safety Culture activities. Training has been developed for regulatory staff to enhance the quality of regulatory interventions on safety culture. This includes a half-day seminar to provide an overview of safety culture, and a workshop which provides more in-depth discussion on cultural issues and how to capture those during regulatory activities. Future plans include guidance for inspectors, and informal seminars on safety culture with licensees

  7. NRC Regulatory Agenda

    International Nuclear Information System (INIS)

    1992-07-01

    This document compilation of all rules on which the NRC has recently completed action, or has proposed action, or is considering action, and all petitions for rule making which have been received by the Commission and are pending disposition by the Commission. The Regulatory Agenda is updated and issued each quarter

  8. NRC regulatory agenda

    International Nuclear Information System (INIS)

    1993-02-01

    This document is a compilation of all rules on which the NRC has recently completed action, or has proposed action, or is considered action, and all petitions for rulemaking which have been received by the Commission and are pending disposition by the Commission. The Regulatory Agenda is updated and issued each quarter

  9. NRC Regulatory Agenda

    International Nuclear Information System (INIS)

    1989-07-01

    This document is a compilation of all rules on which the NRC has proposed or is considering action and all petitions for rulemaking which have been received by the Commission and are pending disposition by the Commission. The Regulatory Agenda is updated and issued each quarter

  10. NRC regulatory agenda

    International Nuclear Information System (INIS)

    1992-11-01

    This document provides a compilation of all rules on which the NRC has recently completed action, or has proposed action, or is considering action, and all petitions for rulemaking which have been received by the Commission and are pending disposition by the Commission. The Regulatory Agenda is updated and issued each quarter

  11. Comments on regulatory reform

    International Nuclear Information System (INIS)

    Hendrie, J.M.

    1982-01-01

    Nuclear regulatory reform is divided into two parts. The first part contains all those matters for which new legislation is required. The second part concerns all those matters that are within the power of the Commission under existing statutes. Recommendations are presented

  12. Comments on regulatory reform

    Energy Technology Data Exchange (ETDEWEB)

    Hendrie, J.M.

    1982-01-01

    Nuclear regulatory reform is divided into two parts. The first part contains all those matters for which new legislation is required. The second part concerns all those matters that are within the power of the Commission under existing statutes. Recommendations are presented.

  13. 3 CFR - Regulatory Review

    Science.gov (United States)

    2010-01-01

    ... as a means of promoting regulatory goals. The fundamental principles and structures governing... review. In this time of fundamental transformation, that process—and the principles governing regulation... the Office of Management and Budget (OMB) has reviewed Federal regulations. The purposes of such...

  14. Role of cooperation activities for capacity building of Romanian Regulatory Authority (CNCAN)

    International Nuclear Information System (INIS)

    Biro, L.; Ciurea-Ercau, C.

    2010-01-01

    With a slow but active nuclear development program of sector since 1980, Romanian regulatory authority had to permanently adapt to the changes in national and international environment in order ensure continuously increase of capacity building and effectiveness, commensurate with the growing nuclear sector. Limited human resources available at the national level put the Romanian Regulatory Authority in the position of building the Technical Support Organization as part of its on organization. International cooperation played an important role in capacity building of Romanian regulatory body and providing necessary assistance in performing regulatory activities or support in development of regulatory framework. Fellowships and technical visits, workshops and training courses provided through IAEA TC at national or regional level, technical assistance provided by European Commission (EC) through PHARE Projects, all provided valuable contribution in assuring training of regulatory staff and development of proper regulatory framework in Romania. Therefore, Romanian Regulatory Authority is putting a strong accent on strengthening and promoting international cooperation through IAEA Technical Cooperation Programme, Molls between regulatory bodies, as one of the key elements in supporting capacity building of regulatory authorities in countries having small or embarking on nuclear power program. Building networks between training centers and research facilities and establishments of regional training centers represent one of the future viable options in preserving knowledge in nuclear field. (author)

  15. Experience Transformed into Nuclear Regulatory Improvements in Russia

    International Nuclear Information System (INIS)

    Sapozhnikov, A.

    2016-01-01

    The third International Conference on Effective Nuclear Regulatory Systems (Canada, 2013) identified the main action items that should be addressed, implemented and followed up. The key technical and organizational areas important to strengthening reactor and spent fuel safety have been determined as following: • Regulatory lessons learned and actions taken (since the accident at the Fukushima Daiichi NPP); • Waste management and spent fuel safety; • Emergency management; • Emerging programmes; • Human and organizational factors, safety and security culture. Over time many activities based on results of the IAEA Integrated Regulatory Review Service in the Russian Federation, 2019, and post-mission, 2013, have been implemented. At present there is progress for the national action plan on nuclear safety, preparation and conducting of long term spent fuel management, complementary reviews for nuclear facilities other than Nuclear Power Plants, emergency exercises with the regulatory body participation, improving communication, development of national regulations and improvement of regulatory system in the whole. The regulatory body ensures assistance in development of national regulatory infrastructure, safety culture to the countries planning to construct Russian design facilities (NPPs, RRs). The report outlines the results and future actions to improve nuclear regulation based on systematic approach to safety and particularly reflects the specificity of taking measures for the research reactors. (author)

  16. pubmed. mineR: An R package with text-mining algorithms to ...

    Indian Academy of Sciences (India)

    Journal of Biosciences. Current Issue : Vol. 43, Issue 1. Current Issue Volume 43 | Issue 1. March 2018. Home · Volumes & Issues · Special Issues · Forthcoming Articles · Gallery of Cover Art · Search · Online submission at eBiosciences · Editorial Board · Information for Authors · Subscription ...

  17. Science and Technology Text Mining: Origins of Database Tomography and Multi-Word Phrase Clustering

    Science.gov (United States)

    2003-08-15

    six decades to the pioneering work in: 1) lexicography of Hornby [1942] to account for co- occurrence knowledge, and 2) linguistics of De Saussure ...of Development in a Research Field," Scientometrics, Vol.19, No.1, 1990b. De Saussure , F., "Cours de Linguistique Generale," 4eme Edition, Librairie

  18. Food safety ontology and text mining strategies as a tool in (re)emerging risk identification

    NARCIS (Netherlands)

    Brug, F. van de

    2009-01-01

    Industry and government are held responsible for the safety of food and feed products. Therefore actual and relevant information concerning emerging safety risks is crucial. But how is it possible to filter relevant information from the fast growing volumes of information produced by science and the

  19. Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

    Science.gov (United States)

    2010-01-01

    Background Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/chemlist. PMID:20331846

  20. Development of Human Face Literature Database Using Text Mining Approach: Phase I.

    Science.gov (United States)

    Kaur, Paramjit; Krishan, Kewal; Sharma, Suresh K

    2018-06-01

    The face is an important part of the human body by which an individual communicates in the society. Its importance can be highlighted by the fact that a person deprived of face cannot sustain in the living world. The amount of experiments being performed and the number of research papers being published under the domain of human face have surged in the past few decades. Several scientific disciplines, which are conducting research on human face include: Medical Science, Anthropology, Information Technology (Biometrics, Robotics, and Artificial Intelligence, etc.), Psychology, Forensic Science, Neuroscience, etc. This alarms the need of collecting and managing the data concerning human face so that the public and free access of it can be provided to the scientific community. This can be attained by developing databases and tools on human face using bioinformatics approach. The current research emphasizes on creating a database concerning literature data of human face. The database can be accessed on the basis of specific keywords, journal name, date of publication, author's name, etc. The collected research papers will be stored in the form of a database. Hence, the database will be beneficial to the research community as the comprehensive information dedicated to the human face could be found at one place. The information related to facial morphologic features, facial disorders, facial asymmetry, facial abnormalities, and many other parameters can be extracted from this database. The front end has been developed using Hyper Text Mark-up Language and Cascading Style Sheets. The back end has been developed using hypertext preprocessor (PHP). The JAVA Script has used as scripting language. MySQL (Structured Query Language) is used for database development as it is most widely used Relational Database Management System. XAMPP (X (cross platform), Apache, MySQL, PHP, Perl) open source web application software has been used as the server.The database is still under the developmental phase and discusses the initial steps of its creation. The current paper throws light on the work done till date.

  1. pubmed. mineR: An R package with text-mining algorithms to ...

    Indian Academy of Sciences (India)

    2016-08-26

    Aug 26, 2016 ... Three case studies are presented, namely, `Evolving role of diabetes educators', `Cancer risk assessment' and `Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus ...

  2. Bioinformatics Methods for Interpreting Toxicogenomics Data: The Role of Text-Mining

    NARCIS (Netherlands)

    Hettne, K.M.; Kleinjans, J.; Stierum, R.H.; Boorsma, A.; Kors, J.A.

    2014-01-01

    This chapter concerns the application of bioinformatics methods to the analysis of toxicogenomics data. The chapter starts with an introduction covering how bioinformatics has been applied in toxicogenomics data analysis, and continues with a description of the foundations of a specific

  3. A text-mining analysis of the public's reactions to the opioid crisis.

    Science.gov (United States)

    Glowacki, Elizabeth M; Glowacki, Joseph B; Wilcox, Gary B

    2017-07-19

    Opioid abuse has become an epidemic in the United States. On August 25, 2016, the former Surgeon General of the United States sent an open letter to care providers asking for their help with combatting this growing health crisis. Social media forums such as Twitter allow for open discussions among the public and up-to-date exchanges of information about timely topics such as opioids. Therefore, the goal of the current study is to identify the public's reactions to the opioid epidemic by identifying the most popular topics tweeted by users. A text miner, algorithmic-driven statistical program was used to capture 73,235 original tweets and retweets posted within a 2-month time span 15 (August 15, 2016, through October 15, 2016). All tweets contained references to "opioids," "turnthetide," or similar keywords. The sets of tweets were then analyzed to identify the most prevalent topics. The most discussed topics had to do with public figures addressing opioid abuse, creating better treatment options for teen addicts, using marijuana as an alternative for managing pain, holding foreign and domestic drug makers accountable for the epidemic, promoting the "Rx for Change" campaign, addressing double standards in the perceptions and treatment of black and white opioid users, and advertising opioid recovery programs. Twitter allows users to find current information, voice their concerns, and share calls for action in response to the opioid epidemic. Monitoring the conversations about opioids that are taking place on social media forums such as Twitter can help public health officials and care providers better understand how the public is responding to this health crisis.

  4. A novel procedure on next generation sequencing data analysis using text mining algorithm.

    Science.gov (United States)

    Zhao, Weizhong; Chen, James J; Perkins, Roger; Wang, Yuping; Liu, Zhichao; Hong, Huixiao; Tong, Weida; Zou, Wen

    2016-05-13

    Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure. The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.

  5. A Text Mining Approach for Extracting Lessons Learned from Project Documentation: An Illustrative Case Study

    Directory of Open Access Journals (Sweden)

    Benjamin Matthies

    2017-12-01

    Full Text Available Lessons learned are important building blocks for continuous learning in project-based organisations. Nonetheless, the practical reality is that lessons learned are often not consistently reused for organisational learning. Two problems are commonly described in this context: the information overload and the lack of procedures and methods for the assessment and implementation of lessons learned. This paper addresses these problems, and appropriate solutions are combined in a systematic lesson learned process. Latent Dirichlet Allocation is presented to solve the first problem. Regarding the second problem, established risk management methods are adapted. The entire lessons learned process will be demonstrated in a practical case study

  6. Text mining to detect indications of fraud in annual reports worldwide

    NARCIS (Netherlands)

    Fissette, Marcia Valentine Maria

    2017-01-01

    The research described in this thesis examined the contribution of text analysis to detecting indications of fraud in the annual reports of companies worldwide. A total of 1,727 annual reports have been collected, of which 402 are of the years and companies in which fraudulent activities took place,

  7. (Text) Mining the LANDscape: Themes and Trends over 40 years of Landscape and Urban Planning

    Science.gov (United States)

    Paul H. Gobster

    2014-01-01

    In commemoration of the journal's 40th anniversary, the co-editor explores themes and trends covered by Landscape and Urban Planning and its parent journals through a qualitative comparison of co-occurrence term maps generated from the text corpora of its abstracts across the four decadal periods of publication.Cluster maps generated from the...

  8. Improving Collaborative Learning in the Classroom: Text Mining Based Grouping and Representing

    Science.gov (United States)

    Erkens, Melanie; Bodemer, Daniel; Hoppe, H. Ulrich

    2016-01-01

    Orchestrating collaborative learning in the classroom involves tasks such as forming learning groups with heterogeneous knowledge and making learners aware of the knowledge differences. However, gathering information on which the formation of appropriate groups and the creation of graphical knowledge representations can be based is very effortful…

  9. Estimation of Cross-Lingual News Similarities Using Text-Mining Methods

    Directory of Open Access Journals (Sweden)

    Zhouhao Wang

    2018-01-01

    Full Text Available In this research, two estimation algorithms for extracting cross-lingual news pairs based on machine learning from financial news articles have been proposed. Every second, innumerable text data, including all kinds news, reports, messages, reviews, comments, and tweets are generated on the Internet, and these are written not only in English but also in other languages such as Chinese, Japanese, French, etc. By taking advantage of multi-lingual text resources provided by Thomson Reuters News, we developed two estimation algorithms for extracting cross-lingual news pairs from multilingual text resources. In our first method, we propose a novel structure that uses the word information and the machine learning method effectively in this task. Simultaneously, we developed a bidirectional Long Short-Term Memory (LSTM based method to calculate cross-lingual semantic text similarity for long text and short text, respectively. Thus, when an important news article is published, users can read similar news articles that are written in their native language using our method.

  10. Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature.

    Science.gov (United States)

    Müller, H-M; Van Auken, K M; Li, Y; Sternberg, P W

    2018-03-09

    The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. We describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents. Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium. Textpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements, and then send those annotations to any database in the world. Textpresso Central URL: http://www.textpresso.org/tpc.

  11. Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized versus Common Languages

    Science.gov (United States)

    Jarman, Jay

    2011-01-01

    This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms,…

  12. DESTAF: A database of text-mined associations for reproductive toxins potentially affecting human fertility

    KAUST Repository

    Dawe, Adam Sean; Radovanovic, Aleksandar; Kaur, Mandeep; Sagar, Sunil; Seshadri, Sundararajan Vijayaraghava; Schaefer, Ulf; Kamau, Allan; Christoffels, Alan G.; Bajic, Vladimir B.

    2012-01-01

    The Dragon Exploration System for Toxicants and Fertility (DESTAF) is a publicly available resource which enables researchers to efficiently explore both known and potentially novel information and associations in the field of reproductive toxicology. To create DESTAF we used data from the literature (including over 10. 500 PubMed abstracts), several publicly available biomedical repositories, and specialized, curated dictionaries. DESTAF has an interface designed to facilitate rapid assessment of the key associations between relevant concepts, allowing for a more in-depth exploration of information based on different gene/protein-, enzyme/metabolite-, toxin/chemical-, disease- or anatomically centric perspectives. As a special feature, DESTAF allows for the creation and initial testing of potentially new association hypotheses that suggest links between biological entities identified through the database.DESTAF, along with a PDF manual, can be found at http://cbrc.kaust.edu.sa/destaf. It is free to academic and non-commercial users and will be updated quarterly. © 2011 Elsevier Inc.

  13. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.

    Science.gov (United States)

    Wang, Yin; Li, Rudong; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

    2016-01-01

    Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.

  14. Text mining in students' course evaluations: Relationships between open-ended comments and quantitative scores

    DEFF Research Database (Denmark)

    Sliusarenko, Tamara; Clemmensen, Line Katrine Harder; Ersbøll, Bjarne Kjær

    2013-01-01

    Extensive research has been done on student evaluations of teachers and courses based on quantitative data from evaluation questionnaires, but little research has examined students' written responses to open-ended questions and their relationships with quantitative scores. This paper analyzes suc...

  15. The Use of Gamification in Education: A Bibliometric and Text Mining Analysis

    Science.gov (United States)

    Martí-Parreño, J.; Méndez-Ibáñez, E.; Alonso-Arroyo, A.

    2016-01-01

    The use of games in education represents a promising tool to motivate and engage students in their learning process. Most of previous research on the topic has focused to develop theoretical frameworks or to conduct experiments as a means to analyse learning outcomes such as knowledge retention, problem-solving skills gains or attitudes toward…

  16. Exploiting Structure and Conventions of Movie Scripts for Information Retrieval and Text Mining

    DEFF Research Database (Denmark)

    Jhala, Arnav

    2008-01-01

    Movie scripts are documents that describe the story, stage direction for actors and camera, and dialogue. Script writers, directors, and cinematographers have standardized the format and language that is used in script writing. Scripts contain a wealth of information about narrative patterns, cha...

  17. Newspaper archives + text mining = rich sources of historical geo-spatial data

    Science.gov (United States)

    Yzaguirre, A.; Smit, M.; Warren, R.

    2016-04-01

    Newspaper archives are rich sources of cultural, social, and historical information. These archives, even when digitized, are typically unstructured and organized by date rather than by subject or location, and require substantial manual effort to analyze. The effort of journalists to be accurate and precise means that there is often rich geo-spatial data embedded in the text, alongside text describing events that editors considered to be of sufficient importance to the region or the world to merit column inches. A regional newspaper can add over 100,000 articles to its database each year, and extracting information from this data for even a single country would pose a substantial Big Data challenge. In this paper, we describe a pilot study on the construction of a database of historical flood events (location(s), date, cause, magnitude) to be used in flood assessment projects, for example to calibrate models, estimate frequency, establish high water marks, or plan for future events in contexts ranging from urban planning to climate change adaptation. We then present a vision for extracting and using the rich geospatial data available in unstructured text archives, and suggest future avenues of research.

  18. Text-mining strategies to support computational research in chemical toxicity (ACS 2017 Spring meeting)

    Science.gov (United States)

    With 26 million citations, PubMed is one of the largest sources of information about the activity of chemicals in biological systems. Because this information is expressed in natural language and not stored as data, using the biomedical literature directly in computational resear...

  19. Web-Based Collaborative Writing in L2 Contexts: Methodological Insights from Text Mining

    Science.gov (United States)

    Yim, Soobin; Warschauer, Mark

    2017-01-01

    The increasingly widespread use of social software (e.g., Wikis, Google Docs) in second language (L2) settings has brought a renewed attention to collaborative writing. Although the current methodological approaches to examining collaborative writing are valuable to understand L2 students' interactional patterns or perceived experiences, they can…

  20. Working in a Text Mine; Is Access about to Go down?

    Science.gov (United States)

    Emery, Jill

    2008-01-01

    The age of networked research and networked data analysis is upon us. "Wired Magazine" proclaims on the cover of their July 2008 issue: "The End of Science. The quest for knowledge used to begin with grand theories. Now it begins with massive amounts of data. Welcome to the Petabyte Age." Computing technology is sufficiently complex at this point…

  1. Text mining scientific papers: a survey on FCA-based information retrieval research

    NARCIS (Netherlands)

    Poelmans, J.; Ignatov, D.I.; Viaene, S.; Dedene, G.; Kuznetsov, S.O.

    2012-01-01

    Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords.

  2. The Use of Systemic-Functional Linguistics in Automated Text Mining

    Science.gov (United States)

    2009-03-01

    what degree two or more documents are similar in terms of their meaning. Simply put, such a cognitive model aims to link the physical manifestation...These features, both in terms of frequency and their chaining across a text, were taken as salient stylistic features that had a direct relationship to...because SFL attempts to model these cognitive processes, this has the potential to improve NLP tasks by making them more ’human-like’. Secondly

  3. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification

    Directory of Open Access Journals (Sweden)

    Yin Wang

    2016-01-01

    Full Text Available Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.

  4. Text mining and network analysis to find functional associations of genes in high altitude diseases.

    Science.gov (United States)

    Bhasuran, Balu; Subramanian, Devika; Natarajan, Jeyakumar

    2018-05-02

    Travel to elevations above 2500 m is associated with the risk of developing one or more forms of acute altitude illness such as acute mountain sickness (AMS), high altitude cerebral edema (HACE) or high altitude pulmonary edema (HAPE). Our work aims to identify the functional association of genes involved in high altitude diseases. In this work we identified the gene networks responsible for high altitude diseases by using the principle of gene co-occurrence statistics from literature and network analysis. First, we mined the literature data from PubMed on high-altitude diseases, and extracted the co-occurring gene pairs. Next, based on their co-occurrence frequency, gene pairs were ranked. Finally, a gene association network was created using statistical measures to explore potential relationships. Network analysis results revealed that EPO, ACE, IL6 and TNF are the top five genes that were found to co-occur with 20 or more genes, while the association between EPAS1 and EGLN1 genes is strongly substantiated. The network constructed from this study proposes a large number of genes that work in-toto in high altitude conditions. Overall, the result provides a good reference for further study of the genetic relationships in high altitude diseases. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. Rapid Prototyping of a Text Mining Application for Cryptocurrency Market Intelligence

    OpenAIRE

    Laskowski, Marek; Kim, Henry M.

    2016-01-01

    Blockchain represents a technology for establishing a shared, immutable version of the truth between a network of participants that do not trust one another, and therefore has the potential to disrupt any financial or other industries that rely on third-parties to establish trust. Recent trends in computing including: prevalence of Free and Open Source Software (FOSS); easy access to High Performance Computing (HPC i.e. 'The Cloud'); and increasingly advanced analytics capabilities such as Na...

  6. Food safety ontology and text mining strategies as a tool in (re)emerging risk identification

    NARCIS (Netherlands)

    Ommen, B. van

    2009-01-01

    Vitamins and many minerals are essential micronutrients, and adequate intake is a major public health concern. This led to the establishment of recommended daily intakes, including subgroup differentiation based on variability and vulnerability. “Western” dietary habits promoted a shift from a

  7. ChemicalTagger: A tool for semantic text-mining in chemistry.

    Science.gov (United States)

    Hawizy, Lezan; Jessop, David M; Adams, Nico; Murray-Rust, Peter

    2011-05-16

    The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision.

  8. ChemicalTagger: A tool for semantic text-mining in chemistry

    OpenAIRE

    Hawizy Lezan; Jessop David M; Adams Nico; Murray-Rust Peter

    2011-01-01

    AbstractBackgroundThe primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free flowing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-th...

  9. ChemicalTagger: A tool for semantic text-mining in chemistry

    Directory of Open Access Journals (Sweden)

    Hawizy Lezan

    2011-05-01

    Full Text Available Abstract Background The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP approaches. Results We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names. Conclusions It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision.

  10. Using text mining and machine learning for detection of child abuse

    NARCIS (Netherlands)

    Amrit, Chintan Amrit; Paauw, Tim; Aly, Robin; Lavric, Miha; Lavric, Miha

    2016-01-01

    Abuse in any form is a grave threat to a child's health. Public health institutions in the Netherlands try to identify and prevent different kinds of abuse, and building a decision support system can help such institutions achieve this goal. Such decision support relies on the analysis of relevant

  11. Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.

    Science.gov (United States)

    Funk, Christopher S; Kahanda, Indika; Ben-Hur, Asa; Verspoor, Karin M

    2015-01-01

    Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a "medium-throughput" pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated.

  12. Nuclear Regulatory Commission information digest

    International Nuclear Information System (INIS)

    1990-03-01

    The Nuclear Regulatory Commission information digest provides summary information regarding the US Nuclear Regulatory Commission, its regulatory responsibilities, and areas licensed by the commission. This is an annual publication for the general use of the NRC Staff and is available to the public. The digest is divided into two parts: the first presents an overview of the US Nuclear Regulatory Commission and the second provides data on NRC commercial nuclear reactor licensees and commercial nuclear power reactors worldwide

  13. Opening up to Big Data: Computer-Assisted Analysis of Textual Data in Social Sciences

    Directory of Open Access Journals (Sweden)

    Gregor Wiedemann

    2013-05-01

    Full Text Available Two developments in computational text analysis may change the way qualitative data analysis in social sciences is performed: 1. the availability of digital text worth to investigate is growing rapidly, and 2. the improvement of algorithmic information extraction approaches, also called text mining, allows for further bridging the gap between qualitative and quantitative text analysis. The key factor hereby is the inclusion of context into computational linguistic models which extends conventional computational content analysis towards the extraction of meaning. To clarify methodological differences of various computer-assisted text analysis approaches the article suggests a typology from the perspective of a qualitative researcher. This typology shows compatibilities between manual qualitative data analysis methods and computational, rather quantitative approaches for large scale mixed method text analysis designs. URN: http://nbn-resolving.de/urn:nbn:de:0114-fqs1302231

  14. Characteristics of regulatory regimes

    Directory of Open Access Journals (Sweden)

    Noralv Veggeland

    2013-03-01

    Full Text Available The overarching theme of this paper is institutional analysis of basic characteristics of regulatory regimes. The concepts of path dependence and administrative traditions are used throughout. Self-reinforcing or positive feedback processes in political systems represent a basic framework. The empirical point of departure is the EU public procurement directive linked to OECD data concerning use of outsourcing among member states. The question is asked: What has caused the Nordic countries, traditionally not belonging to the Anglo-Saxon market-centred administrative tradition, to be placed so high on the ranking as users of the Market-Type Mechanism (MTM of outsourcing in the public sector vs. in-house provision of services? A thesis is that the reason may be complex, but might be found in an innovative Scandinavian regulatory approach rooted in the Nordic model.

  15. NRC regulatory agenda

    International Nuclear Information System (INIS)

    1993-07-01

    The NRC Regulatory Agenda is a compilation of all rules on which the NRC has recently completed action, or has proposed action, or is considering action, and all petitions for rulemaking which have been received by the Commission and are pending disposition by the Commission. The Regulatory Agenda is updated and issued each quarter. The rules on which final action has been taken since March 31, 1993 are: Repeal of NRC standards of conduct; Fitness-for-duty requirements for licensees who possess, use, or transport Category I material; Training and qualification of nuclear power plant personnel; Monitoring the effectiveness of maintenance at nuclear power plants; Licensing requirements for land disposal of radioactive wastes; and Licensees' announcements of safeguards inspections

  16. A flexible regulatory framework

    International Nuclear Information System (INIS)

    Silvennoinen, T.

    2000-01-01

    Regulatory reform of the Finnish electricity market meant opening up potentially competitive parts of the electricity sector to competition and eliminating all unnecessary forms of regulation covering generation, wholesale supply, retail supply, and foreign trade in electricity. New types of control and regulatory mechanisms and institutions were set up for those parts of the electricity industry that were excluded from competition, such as network operations. Network activities now have to be licensed, whereas no licence is needed for generation or supply. A new sector-specific regulatory authority was established in 1995 to coincide with the implementation of the Electricity Market Act, known as the Electricity Market Authority. This is responsible for regulating network activities and retail supply to captive customers. The core function of the authority, which employs some 14 people, is to promote the smooth operation of the Finnish electricity market and to oversee the implementation of the Electricity Market Act and its provisions. Its most important duties are linked to overseeing the process by which network companies price their electricity. As price regulation no longer exists, all the companies in the electricity sector set their tariffs independently, even network companies. The job of controlling the pricing of network services is handed by the Electricity Market Authority, following the principles of competition control. Pricing control takes place ex post - after a pricing system has been adopted by a company and concentrates on individual cases and companies. There is no ex ante system of setting or approving prices and tariffs by the regulator. The tariffs and pricing of network services can be evaluated, however, by both the Electricity Market Authority and the Finnish Competition Authority, which have overlapping powers as regards the pricing of network activities. The Finnish regulatory framework can be described as a system of light

  17. Assisted Vaginal Delivery

    Science.gov (United States)

    ... Education & Events Advocacy For Patients About ACOG Assisted Vaginal Delivery Home For Patients Search FAQs Assisted Vaginal ... Vaginal Delivery FAQ192, February 2016 PDF Format Assisted Vaginal Delivery Labor, Delivery, and Postpartum Care What is ...

  18. The changing regulatory environment

    International Nuclear Information System (INIS)

    Caron, G.

    1999-01-01

    The role and value of regulation in the energy sector was discussed, demonstrating how, despite common perception, regulation is an essential part of Canada's strategy to find and develop new opportunities. The future vision of regulation for industry participants was presented with particular focus on issues related to streamlining the regulatory process. As far as pipelines are concerned, regulatory actions are necessary to facilitate capacity increases and to ensure the line's integrity, safety and environmental record. Furthermore, regulation provides economic solutions where market forces cannot provide them, as for example where business has elements of monopoly. It arbitrates interests of landowners, business, consumers, and environmental groups. It looks for ways to ensure conditions under which competition can flourish. It acts as the guardian of citizens' rights in a democratic society by providing citizens with an opportunity to be heard on the building or expansion of pipelines and associated facilities. As citizens become more and more concerned about their property and the land that surrounds them, citizen involvement in decision making about how industry activity affects their quality of life will become correspondingly more important. Regulatory agencies are committed to facilitate this engagement by flexible hearing procedures and by making use of evolving communication and information technology

  19. Regulatory aspects of radiopharmaceuticals

    International Nuclear Information System (INIS)

    Kristensen, K.

    1985-01-01

    Regulatory systems in the field of radiopharmaceuticals have two main purposes: efficacy and safety. Efficacy expresses the quality of the diagnostic and therapeutic process for the patient. Safety involves the patient, the staff, and the environment. The world situation regarding regulations for radiopharmaceuticals is reviewed on the basis of a survey in WHO Member States. The main content of such regulations is discussed. The special properties of radiopharmaceuticals compared with ordinary drugs may call for modified regulations. Several countries are preparing such regulations. Close co-operation and good understanding among scientists working in hospital research, industry and regulatory bodies will be of great importance for the fast and safe introduction of new radiopharmaceuticals for the benefit of the patient. Before introducing new legislation in this field, a radiopharmaceutical expert should analyse the situation in the country and the relationship to the existing regulations. It is expected that the most important factor in promoting the fast introduction of new, safe and effective radiopharmaceuticals will be the training of people working within the regulatory bodies. It is foreseen that the IAEA and WHO will have an important role to play by providing expert advice and training in this area. (author)

  20. Canine assisted reading

    OpenAIRE

    Sever, Jerneja

    2016-01-01

    The diploma thesis presents various aspects of animals included in animal-assisted interventions. In theoretical part, I introduced different possible ways of animal-assisted interventions: animal-assisted therapy, animal-assisted activities and animal-assisted education. Animals became common visitors in educational settings all over the world. I presented positive influences on various aspects of human life, as well limitations when animal-assisted interventions are not possible to perform ...

  1. The Atomic Energy Control Board's regulatory research and support program

    International Nuclear Information System (INIS)

    1988-04-01

    The purpose of the Regulatory Research and Support Program is to augment and extend the capability of the Atomic Energy Control Board's (AECB) regulatory program beyond the capability of in-house resources. The overall objective of the program is to produce pertinent and independent scientific and other knowledge and expertise that will assist the AECB in making correct, timely and credible decisions on regulating the development, application and use of atomic energy. The objectives are achieved through contracted research, development, studies, consultant and other kinds of projects administered by the Research and Radiation Protection Branch (RRB) of the AECB

  2. The Regulatory Plan

    Science.gov (United States)

    2010-12-20

    ... goals with the interest in economic recovery. Finally, in emphasizing the value of providing access to... Nutrition Assistance Program: Farm Bill of 2008 Retailer Sanctions 0584-AD88 Proposed Rule Stage 12 Fresh... Children for Free Meals in the NSLP, SBP, and SMP 15 Special Supplemental Nutrition Program for Women...

  3. Using Inequality Measures to Incorporate Environmental Justice into Regulatory Analyses

    Science.gov (United States)

    Harper, Sam; Ruder, Eric; Roman, Henry A.; Geggel, Amelia; Nweke, Onyemaechi; Payne-Sturges, Devon; Levy, Jonathan I.

    2013-01-01

    Formally evaluating how specific policy measures influence environmental justice is challenging, especially in the context of regulatory analyses in which quantitative comparisons are the norm. However, there is a large literature on developing and applying quantitative measures of health inequality in other settings, and these measures may be applicable to environmental regulatory analyses. In this paper, we provide information to assist policy decision makers in determining the viability of using measures of health inequality in the context of environmental regulatory analyses. We conclude that quantification of the distribution of inequalities in health outcomes across social groups of concern, considering both within-group and between-group comparisons, would be consistent with both the structure of regulatory analysis and the core definition of environmental justice. Appropriate application of inequality indicators requires thorough characterization of the baseline distribution of exposures and risks, leveraging data generally available within regulatory analyses. Multiple inequality indicators may be applicable to regulatory analyses, and the choice among indicators should be based on explicit value judgments regarding the dimensions of environmental justice of greatest interest. PMID:23999551

  4. Nuclear regulatory communication with the public: 10 years of progress

    International Nuclear Information System (INIS)

    Gauvain, J.; Jorle, A.; Chanial, L.

    2008-01-01

    The NEA has an acknowledged role to assist its member countries in maintaining and developing, through international co-operation, the scientific, technological and legal bases required for a safe, environmentally friendly and economical use of nuclear energy. In this context, the NEA Committee on Nuclear Regulatory Activities (CNRA) provides a forum for senior representatives from nuclear regulatory bodies to exchange information and experience on nuclear regulatory policies and practices in NEA member countries and to review developments which could affect regulatory requirements. Public confidence in government and in risk management structures is important to all developed countries with an open society. The use of nuclear power in a democracy is built upon a certain trust in the political system and the national authorities. To foster and maintain such trust in a period of greater public scrutiny of nuclear activities, a number of nuclear regulatory organisations (NROs) initiated various processes to pro-actively inform the public about their supervision and control of nuclear activities, or when appropriate to involve the public in decision making. In 1998 the question was raised within the CNRA of whether public trust in the regulator might be very different from one country to another, and an activity was started among member countries to exchange experience and best practices and to learn lessons about NRO communication with their publics. Three workshops were organised by the NEA, and a Working Group on Public Communication of Nuclear Regulatory Organisations was set up in 2001. The activities and findings are summarised below. (author)

  5. Handbook for value-impact assessments of NRC regulatory actions

    International Nuclear Information System (INIS)

    Mullen, M.F.; DiPalo, A.J.

    1985-01-01

    According to current Nuclear Regulatory Commission (NRC) procedures, value-impact (cost-benefit) assessments must be prepared for all rulemaking actions and for a broad range of other regulatory requirements and guidance. Probabilistic risk assessment (PRA) methods furnish an important part of the information base for these assessments. PRA methods are frequently the principal quantitative tool for estimating the benefits (e.g., public risk reduction) of proposed regulatory actions. In December 1983, the NRC published A Handbook for Value-Impact Assessment, NUREG/CR-3568, which provides a set of systematic procedures for performing value-impact assessments. The Handbook contains methods, data, and sources of information that can assist the regulatory analyst in conducting such assessments. The use of probabilistic risk analysis to estimate the benefits of proposed regulatory actions is described. Procedures and methods are also given for evaluating the costs and other consequences associated with regulatory actions. The Handbook has been adopted by the NRC as the recommended guideline for value impact assessments. This paper presents the background, objectives, and scope of the Handbook, describes the value-impact assessment methods (including the use of probabilistic risk assessment to estimate benefits), and discusses a selection of current and planned applications, with examples to illustrate how the methods are used

  6. 13 CFR 108.2010 - Restrictions on use of Operational Assistance grant funds.

    Science.gov (United States)

    2010-01-01

    ... Assistance in connection with a Low-Income Investment made by the SSBIC with Regulatory Capital raised after... ADMINISTRATION NEW MARKETS VENTURE CAPITAL (âNMVCâ) PROGRAM Requirements and Procedures for Operational...

  7. Schedules for Regulatory Regimes

    International Nuclear Information System (INIS)

    Austvik, Ole Gunnar

    2003-01-01

    The idea of regulating transporters' terms of operations is that if the market itself does not produce optimal outcomes, then it can be mimicked to do so through regulatory and other public instruments. The first-best solution could be a subsidized (publicly owned) enterprise that sets tariffs according to marginal costs. This has been the tradition in many European countries in the aftermath of WW2. Due to lack of innovative pressure on and x-inefficiency in these companies, this solution is today viewed as inferior to the system of regulating independent (privately owned) firms. When the European gas market becomes liberalized, part of the process in many countries is to (partially) privatise the transport utilities. Privatised or not, in a liberalized market, the transport utilities should face an independent authority that overviews their operations not only in technical, but also in economic terms. Under regulation, a ''visible hand'' is introduced to correct the imperfect market's ''invisible hand''. By regulating the framework and conditions for how firms may operate, public authorities seek to achieve what is considered optimal for the society. The incentives and disincentives given for pricing and production should create mechanisms leading to an efficient allocation of resources and ''acceptable'' distribution of income. As part of intervening into firms' behavior, regulation may be introduced to direct the firm to behave in certain ways. The framework and regulatory mechanisms for the market must then be constructed in a way that companies voluntarily produce an amount at a price that gives maximal profits and simultaneously satisfies social goals. The regulations should lead to consistency between the company's desire to maximize profits and the society's desire for maximizing welfare, as in a perfectly competitive market. This is the core of regulatory economics

  8. E3Net: a system for exploring E3-mediated regulatory networks of cellular functions.

    Science.gov (United States)

    Han, Youngwoong; Lee, Hodong; Park, Jong C; Yi, Gwan-Su

    2012-04-01

    Ubiquitin-protein ligase (E3) is a key enzyme targeting specific substrates in diverse cellular processes for ubiquitination and degradation. The existing findings of substrate specificity of E3 are, however, scattered over a number of resources, making it difficult to study them together with an integrative view. Here we present E3Net, a web-based system that provides a comprehensive collection of available E3-substrate specificities and a systematic framework for the analysis of E3-mediated regulatory networks of diverse cellular functions. Currently, E3Net contains 2201 E3s and 4896 substrates in 427 organisms and 1671 E3-substrate specific relations between 493 E3s and 1277 substrates in 42 organisms, extracted mainly from MEDLINE abstracts and UniProt comments with an automatic text mining method and additional manual inspection and partly from high throughput experiment data and public ubiquitination databases. The significant functions and pathways of the extracted E3-specific substrate groups were identified from a functional enrichment analysis with 12 functional category resources for molecular functions, protein families, protein complexes, pathways, cellular processes, cellular localization, and diseases. E3Net includes interactive analysis and navigation tools that make it possible to build an integrative view of E3-substrate networks and their correlated functions with graphical illustrations and summarized descriptions. As a result, E3Net provides a comprehensive resource of E3s, substrates, and their functional implications summarized from the regulatory network structures of E3-specific substrate groups and their correlated functions. This resource will facilitate further in-depth investigation of ubiquitination-dependent regulatory mechanisms. E3Net is freely available online at http://pnet.kaist.ac.kr/e3net.

  9. International regulatory activities

    International Nuclear Information System (INIS)

    Anon.

    2009-01-01

    In this last part is reviewed international regulatory activities and bilateral agreements including two parts: concerning European atomic energy community with European commission proposal for a council directive setting up a community framework for nuclear safety, update of the nuclear illustrative programme in the context of the second strategic energy review, european commission recommendation on criteria for the export of radioactive waste and spent fuel to third countries and a communication on nuclear non-proliferation and the second part in relation with international atomic energy agency with a joint convention on the safety of spent fuel management and on safety of radioactive waste management (third review meeting). (N.C.)

  10. International regulatory activities

    International Nuclear Information System (INIS)

    Anon.

    2002-01-01

    Different international regulatory activities are presented: recommendation on the protection of the public against exposure to radon in drinking water supplies, amendment to the legislation implementing the regulation on imports of agricultural products originating in third countries following the Chernobyl accident, resolution on the commission green paper towards a European strategy for the security of energy supply, declaration of mandatory nature of the international code for the safe carriage of packaged irradiated nuclear fuel, plutonium and high level radioactive wastes on board ships, adoption of action plan against nuclear terrorism. (N.C.)

  11. The core to regulatory reform

    International Nuclear Information System (INIS)

    Partridge, J.W. Jr.

    1993-01-01

    Federal Energy Regulatory Commission (FERC) Orders 436, 500, and 636, the Clean Air Act Amendments of 1990, Public Utility Holding Company Act reform, and the 1992 Energy Policy Act all can have significant effects on an LDC's operations. Such changes in an LDC's environments must be balanced by changes within the utility, its marketplace, and its state regulatory environment. The question is where to start. For Columbia Gas Distribution Cos., based in Columbus, OH, the new operating foundation begins with each employee. Internal strength is critical in designing initiatives that meet the needs of the marketplace and are well-received by regulators. Employees must understand not only the regulatory environment in which the LDC operates, but also how their work contributes to a positive regulatory relationship. To achieve this, Columbia initiated the COntinuing Regulatory Education program, or CORE, in 1991. CORE is a regulatory-focused, information-initiative program coordinated by Columbia's Regulatory Policy, Planning, and Government Affairs Department. The CORE programs can take many forms, such as emerging issue discussions, dialogues with regulators and key parties, updates on regulatory fillings, regulatory policy meetings, and formal training classes. The speakers and discussion facilitators can range from human resource department trainers to senior officers, from regulatory department staff members to external experts, or from state commissioners to executives from other LDCs. The goals of CORE initiatives are to: Support a professional level of regulatory expertise through employee participation in well-developed regulatory programs presented by credible experts. Encourage a constructive state regulatory environment founded on communication and cooperation. CORE achieves these goals via five program levels: introductory basics, advanced learning, professional expertise, crossfunctional dialogues, and external idea exchanges

  12. Politically Induced Regulatory Risk and Independent Regulatory Agencies

    OpenAIRE

    Strausz, Roland

    2015-01-01

    Uncertainty in election outcomes generates politically induced regulatory risk. Political parties' risk attitudes towards such risk depend on a fluctuation effect that hurts both parties and an output--expansion effect that benefits at least one party. Notwithstanding the parties' risk attitudes, political parties have incentives to negotiate away all regulatory risk by pre-electoral bargaining. Efficient pre-electoral bargaining outcomes fully eliminate politically induced regulatory risk. P...

  13. Regulatory networks, legal federalism, and multi-level regulatory systems

    OpenAIRE

    Kerber, Wolfgang; Wendel, Julia

    2016-01-01

    Transnational regulatory networks play important roles in multi-level regulatory regimes, as e.g, the European Union. In this paper we analyze the role of regulatory networks from the perspective of the economic theory of legal federalism. Often sophisticated intermediate institutional solutions between pure centralisation and pure decentralisation can help to solve complex tradeoff problems between the benefits and problems of centralised and decentralised solutions. Drawing upon the insight...

  14. The African Health Profession Regulatory Collaborative for Nurses and Midwives

    Directory of Open Access Journals (Sweden)

    McCarthy Carey F

    2012-08-01

    Full Text Available Abstract Background More than thirty-five sub-Saharan African countries have severe health workforce shortages. Many also struggle with a mismatch between the knowledge and competencies of health professionals and the needs of the populations they serve. Addressing these workforce challenges requires collaboration among health and education stakeholders and reform of health worker regulations. Health professional regulatory bodies, such as nursing and midwifery councils, have the mandate to reform regulations yet often do not have the resources or expertise to do so. In 2011, the United States of America Centers for Disease Control and Prevention began a four-year initiative to increase the collaboration among national stakeholders and help strengthen the capacity of health professional regulatory bodies to reform national regulatory frameworks. The initiative is called the African Health Regulatory Collaborative for Nurses and Midwives. This article describes the African Health Regulatory Collaborative for Nurses and Midwives and discusses its importance in implementing and sustaining national, regional, and global workforce initiatives. Discussion The African Health Profession Regulatory Collaborative for Nurses and Midwives convenes leaders responsible for regulation from 14 countries in East, Central and Southern Africa. It provides a high profile, south-to-south collaboration to assist countries in implementing joint approaches to problems affecting the health workforce. Implemented in partnership with Emory University, the Commonwealth Secretariat, and the East, Central and Southern African College of Nursing, this initiative also supports four to five countries per year in implementing locally-designed regulation improvement projects. Over time, the African Health Regulatory Collaborative for Nurses and Midwives will help to increase the regulatory capacity of health professional organizations and ultimately improve regulation and

  15. Visions of regulatory renewal

    International Nuclear Information System (INIS)

    Edgeworth, A.

    1998-01-01

    The economic contribution of the CEPA (Canadian Energy Pipeline Association) member companies to Canada's trade balance was discussed. CEPA member companies transport 95 per cent of the crude oil and natural gas produced in Canada to domestic and export markets. This represents a total of 5.6 Tcf of gas annually. Half of Canada's natural gas and oil production is exported to U.S. markets. All of these exports are transported by pipeline. CEPA member companies operate 90,000 km of pipeline from British Columbia to Quebec. Expansions are needed as a result of a significant increase in demand for natural gas and crude oil since 1990. Several issues exist for regulatory renewal. They include the need to create a level playing field, the overseeing of tolls and contract renewal terms, changing risk/reward trade-offs, the right to confidentiality of information and price discovery mechanism. The drivers for regulatory reform at Westcoast Energy are the need for pricing flexibility, customers desire for toll certainty, decontracting and opposition to rolled-in expansions for gathering and processing. An overview of Westcoast Energy's negotiated toll settlement, its implications, and the components of Westcoast Energy's 'light handed regulation' (LHR) was presented

  16. The regulatory dynamic

    International Nuclear Information System (INIS)

    Dybwad, C.

    2001-01-01

    An outline of the activities and efforts expanded by the National Energy Board to adjust to the changing natural gas market was provided in this presentation. The author began by defining the role of the National Energy Board in energy markets. It must ensure the adoption of rules and procedures that result in a more competitive and efficient market. Light-handed regulatory techniques are the norm, and the National Energy Board is now committed to facilitating the availability and flow of information so that all parties know where opportunities exist, the terms offered to buy or sell goods and services, their quality and costs. It will specialize in providing new participants with information on the workings of the market, who the players are, the regulatory processes in place, and how, when and where the market can be accessed. The manner in which the Board deals with information was reviewed, providing examples along the way to clarify some points. Some of the documents produced by the National Energy Board are being reviewed with the intent of making them easier to read and understand. Audio streaming over the Internet is another avenue being pursued to ensure individuals can listen in real time to hearings without having to be present in the room. The National Energy Board is also exploring alternative dispute resolution techniques. Consultation with energy market participants represents another facet of these efforts to be more accessible and responsive

  17. Regulatory inspection of BARC facilities

    International Nuclear Information System (INIS)

    Rajdeep; Jayarajan, K.

    2017-01-01

    Nuclear and radiation facilities are sited, constructed, commissioned, operated and decommissioned, in conformity with the current safety standards and codes. Regulatory bodies follow different means to ensure compliance of the standards for the safety of the personnel, the public and the environment. Regulatory Inspection (RI) is one of the important measures employed by regulatory bodies to obtain the safety status of a facility or project and to verify the fulfilment of the conditions stipulated in the consent

  18. Strengthening of the nuclear safety regulatory body. Field evaluation review

    International Nuclear Information System (INIS)

    1996-10-01

    As a result of a request from the Preparation Committee of the Nuclear Regulatory Authority (NRA) in 1992, and as recommended by the CEC/RAMG (Commission of European Communities/Regulatory Assistance Management Group) and the Agency mission in July 1993 to the Slovak Republic, the project SLR/9/005 was approved in 1993 as a model project for the period 1994-1996. Current budge is $401,340 and disbursements to date amount to $312,873. The project time schedule has been extended to 1997. The major conclusions of this evaluation are as follows: The project responded to an urgent national need, as well as to a statutory mandate of the Agency, and was adequately co-ordinated with other international assistance programmes to NRA. The project was designed as a structured programme of assistance by means of expert missions, scientific visits and a limited amount of equipment, acting upon several key areas of NRA regulatory responsibilities. Agency assistance was provided in a timely manner. A high concentration of expert missions was noticed at the initial stages of the project, which posed some managements problems. This was corrected to some extent in the course of implementation. Additionally, some overlapping of expert mission recommendations suggests that improvements are needed in the design of such missions. The exposure to international regulatory practice and expertise has resulted in substantial developments of NRA, both in organizational and operational terms. The project can claim to have contributed to NRA having gained governmental and international confidence. NRA's role in the safety assessment of Bohunice V1 reconstruction, as well as in Bohunice V2 safety review, Bohunice A1 decommissioning and in informing the public, also points at the success achieved by the project. The institutional and financial support of the Government contributed decisively to the project achievements. (author). Figs, tabs

  19. National legislative and regulatory activities

    International Nuclear Information System (INIS)

    2012-01-01

    This section gathers the following national legislative and regulatory activities sorted by country: Bulgaria: General legislation; Czech Republic: General legislation; France: General legislation, Regulatory infrastructure and activity; Germany: General legislation; India: Liability and compensation, Organisation and structure; Ireland: Radiation protection, General legislation; Korea (Republic of): Organisation and structure; Lithuania: Regulatory infrastructure and activity, Radioactive waste management, Radiation protection, international cooperation, Nuclear safety; Poland: General legislation; Romania: Environmental protection; Russian Federation: Radioactive waste management; Slovenia: Nuclear safety; Spain: Liability and compensation, Nuclear security; Sweden: Nuclear safety; Turkey: Radiation protection, Regulatory infrastructure and activity, Nuclear safety, Liability and compensation; United States: General legislation

  20. Regulatory actions post - Fukushima

    International Nuclear Information System (INIS)

    Ciurea Ercau, C.

    2013-01-01

    The paper presents the results of the safety reviews performed in Romania after the Fukushima accident and the resulting actions for improving the safety. The actions taken by the National Commission for Nuclear Activities Control (CNCAN) to improve the regulatory framework include the development of new regulations and the enhancement of inspection practices, taking account of the lessons learned from the Fukushima accident. A regulation on the response to transients, accidents and emergency situations at nuclear power plants has been developed, which includes requirements on transient and accident scenarios that have to be covered by the Emergency Operating Procedures (EOPs), accident scenarios to be covered by the Severe Accident Management Guidelines (SAMGs), emergency situations to be covered by the on-site emergency response plan and emergency response procedures. (authors)

  1. International regulatory activities

    International Nuclear Information System (INIS)

    Anon.

    2010-01-01

    Concerning International regulatory activities, we find for the european atomic energy community an entry into force of the lisbon treaty (2009), it amends the treaty on European union and replaces the treaty establishing the European Community by the new treaty on the functioning of the European Union; more, an amendment to council regulation on the conditions governing imports of agricultural products originating in third countries following the accident at the Chernobyl nuclear power station (2009). About International atomic energy agency is reported an open-ended meeting of technical and legal experts for sharing of information on states implementation of the code of conduct on the safety and security of radioactive sources and its supplementary guidance on the import and export of radioactive sources (2010). (N.C.)

  2. Regulatory mark; Marco regulatorio

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2009-10-15

    This chapter is based on a work performed in distinct phases. The first phase consisted in of the analysis regulatory legislation existent in Brazil for the sugar-alcohol sector since the beginning of the X X century. This analysis allowed the identification of non existent points and legal devices related to the studied aspects, and that were considered as problematic for the sector expansion. In the second phase, related treaties and international agreements was studied and possible obstacles for the brazilian bio ethanol exportation for the international market. Initiatives were examined at European Union, United States of America, Caribbean and countries of the sub-Saharan Africa. In this phase, policies were identified related to the incentives and adoption of use of bio fuels added to the gasoline in countries or group of countries considered as key for the consolidation of bio ethanol as a world commodity.

  3. International regulatory activities

    International Nuclear Information System (INIS)

    Anon.

    2003-01-01

    Among international regulatory activities we find resolutions adopted by the IAEA general conference (2003), through European Union we find proposals for directives on nuclear safety and radioactive waste management, new regulation on the application of EURATOM safeguards, control of high activity sealed radioactive sources, recommendation on the protection and information of the public with regard to the continued contamination of certain wild food products following the Chernobyl accident, proposals for decisions authorizing the Member states to sign and ratify the Protocol to amend the Paris convention, p)proposals for a directive on environment liability with regard to the prevention and remedying of environmental damage, proposal of a regulation on the law applicable to non-contractual obligation. (N.C.)

  4. Nuclear Regulatory Commission Issuances

    International Nuclear Information System (INIS)

    1992-01-01

    This is the thirty-sixth volume of issuances (1-396) of the Nuclear Regulatory Commission and its Atomic Safety and Licensing Boards, Administrative Law Judges, and Office Directors. It covers the period from July 1, 1992-December 31, 1992. Atomic Safety and Licensing Boards are authorized by Section 191 of the Atomic Energy Act of 1954. These Boards, comprised of three members conduct adjudicatory hearings on applications to construct and operate nuclear power plants and related facilities and issue initial decisions which, subject to internal review and appellate procedures, become the final Commission action with respect to those applications. Boards are drawn from the Atomic Safety and Licensing Board Panel, comprised of lawyers, nuclear physicists and engineers, environmentalists, chemists, and economists. The Atomic Energy Commission first established Licensing Boards in 1962 and the Panel in 1967

  5. System engineering in the Nuclear Regulatory Commission licensing process: Program architecture process and structure

    International Nuclear Information System (INIS)

    Romine, D.T.

    1989-01-01

    In October 1987, the U.S. Nuclear Regulatory Commission (NRC) established the Center for Nuclear Waste Regulatory Analyses at Southwest Research Institute in San Antonio, Texas. The overall mission of the center is to provide a sustained level of high-quality research and technical assistance in support of NRC regulatory responsibilities under the Nuclear Waste Policy Act (NWPA). A key part of that mission is to assist the NRC in the development of the program architecture - the systems approach to regulatory analysis for the NRC high-level waste repository licensing process - and the development and implementation of the computer-based Program Architecture Support System (PASS). This paper describes the concept of program architecture, summarizes the process and basic structure of the PASS relational data base, and describes the applications of the system

  6. Regulatory focus in groupt contexts

    NARCIS (Netherlands)

    Faddegon, Krispijn Johannes

    2009-01-01

    The thesis examines the influence of group processes on the regulatory focus of individual group members. It is demonstrated that the group situation can affect group members' regulatory focus both in a top-down fashion (via the identitiy of the group) and in a bottom-up fashion (emerging from the

  7. Disclosure as a regulatory tool

    DEFF Research Database (Denmark)

    Sørensen, Karsten Engsig

    2006-01-01

    The chapter analyses how disclure can be used as a regulatory tool and analyses how it has been applied so far in the area of financial market law and consumer law.......The chapter analyses how disclure can be used as a regulatory tool and analyses how it has been applied so far in the area of financial market law and consumer law....

  8. 77 FR 44562 - Housing Assistance Due to Structural Damage

    Science.gov (United States)

    2012-07-30

    ... instructions for submitting comments. Mail/Hand Delivery/Courier: Regulatory Affairs Division, Office of Chief..., fuel, or clothing costs. Specifically, FEMA provides the following types of housing assistance... Index (CPI). Second, PKEMRA amended subsection 408(c)(4) of the Stafford Act by removing the word...

  9. 75 FR 7526 - Withdrawal of Regulatory Guide

    Science.gov (United States)

    2010-02-19

    ...'s Electronic Reading Room at http://www.nrc.gov/reading-rm/doc-collections . Regulatory guides are... NUCLEAR REGULATORY COMMISSION [NRC-2010-0052] Withdrawal of Regulatory Guide AGENCY: Nuclear Regulatory Commission. ACTION: Withdrawal of Regulatory Guide 1.56, ``Maintenance of Water Purity in Boiling...

  10. 12 CFR 562.2 - Regulatory reports.

    Science.gov (United States)

    2010-01-01

    ... § 562.2 Regulatory reports. (a) Definition and scope. This section applies to all regulatory reports, as... (TFR) are examples of regulatory reports. Regulatory reports are regulatory documents, not accounting... limited to, the accounting instructions provided in the TFR, guidance contained in OTS regulations...

  11. Regulatory research and support program for 1992/93 - project descriptions. Information bulletin

    International Nuclear Information System (INIS)

    1992-01-01

    The Regulatory Research and Support Program (RSP) is intended to augment and extend the Atomic Energy Control Board's regulatory program beyond the capability of in-house resources. The overall objective of the research and support program is to produce pertinent and independent information that will assist the Board and its staff in making correct, timely and credible decisions on regulating nuclear facilities and materials

  12. Regulatory research and support program for 1992/93 - project descriptions. Information bulletin

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1992-03-02

    The Regulatory Research and Support Program (RSP) is intended to augment and extend the Atomic Energy Control Board`s regulatory program beyond the capability of in-house resources. The overall objective of the research and support program is to produce pertinent and independent information that will assist the Board and its staff in making correct, timely and credible decisions on regulating nuclear facilities and materials.

  13. Virginia Power's regulatory reduction program

    International Nuclear Information System (INIS)

    Miller, G.D.

    1996-01-01

    Virginia Power has two nuclear plants, North Anna and Surry Power Stations, which have two units each for a total of four nuclear units. In 1992, the Nuclear Regulatory Commission solicited comments from the nuclear industry to obtain their ideas for reducing the regulatory burden on nuclear facilities. Pursuant to the new regulatory climate, Virginia Power developed an internal program to evaluate and assess the regulatory and self-imposed requirements to which they were committed, and to pursue regulatory relief or internal changes where possible and appropriate. The criteria were that public safety must be maintained, and savings must be significant. Up to the date of the conference, over US$22 million of one-time saving had been effected, and US$2.75 million in annual savings

  14. Anti-regulatory T cells

    DEFF Research Database (Denmark)

    Andersen, Mads Hald

    2017-01-01

    responses to tumours or inhibiting autoimmunity development. However, recent studies report the discovery of self-reactive pro-inflammatory T cells—termed anti-regulatory T cells (anti-Tregs)—that target immune-suppressive cells. Thus, regulatory cells can now be defined as both cells that suppress immune...... reactions as well as effector cells that counteract the effects of suppressor cells and support immune reactions. Self-reactive anti-Tregs have been described that specifically recognize human leukocyte antigen-restricted epitopes derived from proteins that are normally expressed by regulatory immune cells......Our initial understanding of immune-regulatory cells was based on the discovery of suppressor cells that assure peripheral T-cell tolerance and promote immune homeostasis. Research has particularly focused on the importance of regulatory T cells (Tregs) for immune modulation, e.g. directing host...

  15. Promoting and assessment of safety culture within regulatory body

    International Nuclear Information System (INIS)

    Awasthi, Sumit; Bhattacharya, D.; Koley, J.; Krishnamurthy, P.R.

    2015-01-01

    Regulators have an important role to play in assisting organizations under their jurisdiction to develop positive safety cultures. It is therefore essential for the regulator to have a robust safety culture as an inherent strategy and communication of this strategy to the organizations it supervises. Atomic Energy Regulatory Board (AERB) emphasizes every utility to institute a good safety culture during various stages of a NPP. The regulatory requirement for establishing organisational safety culture within utility at different stages are delineated in the various AERB safety codes which are presented in the paper. Although the review and assessment of the safety culture is a part of AERB’s continual safety supervision through existing review mechanism, AERB do not use any specific indicators for safety culture assessment. However, establishing and nurturing a good safety culture within AERB helps in encouraging the utility to institute the same. At the induction level AERB provides training to its staffs for regulatory orientation which include a specific course on safety culture. Subsequently, the junior staffs are mentored by seniors while involving them in various regulatory processes and putting them as observers during regulatory decision making process. Further, AERB established a formal procedure for assessing and improving safety culture within its staff as a management system process. The paper describes as a case study the above safety culture assessment process established within AERB

  16. Regulatory inspection of nuclear power plants in NEA member countries

    International Nuclear Information System (INIS)

    Gronow, W.S.; Ilani, O.

    1977-01-01

    The increasing use of nuclear power and public interest in the safety controls led to the proposal by the sub-Committe on Licensing of the NEA Committee on the Safety of Nuclear Installations for a specialist meeting on regulatory inspection practices. This report which was prepared at the request of the sub-Committee to assist in the exchange of views and experience at the meeting reviews the response to a questionnaire on the systems employed, the scope and objectives and the effort involved in regulatory inspection throughout all stages of the life of a nuclear power plant. Other aspects of regulatory inspection activities are discussed including documentation, procedures for changes in technical specification and modifications to plant, powers and duties of regulatory inspection personnel and actions to be taken in the event of an accident or emergency. The report concludes with some comments on those aspects of regulatory inspection practices where further information and an exchange of experience might prove to be beneficial to Member countries. (author)

  17. Institutionalizing Security Force Assistance

    National Research Council Canada - National Science Library

    Binetti, Michael R

    2008-01-01

    .... It looks at the manner in which security assistance guidance is developed and executed. An examination of national level policy and the guidance from senior military and civilian leaders highlights the important role of Security Force Assistance...

  18. ForeignAssistance.gov

    Data.gov (United States)

    US Agency for International Development — ForeignAssistance.gov provides a view of U.S. Government foreign assistance funds across agencies and enables users to explore, analyze, and review aid investments...

  19. Partnership for Prescription Assistance

    Science.gov (United States)

    ... may use our name without our permission. The Partnership for Prescription Assistance will help you find the ... Events Blog Facebook Twitter Start living better. The Partnership for Prescription Assistance helps qualifying patients without prescription ...

  20. Assisted delivery with forceps

    Science.gov (United States)

    ... page: //medlineplus.gov/ency/patientinstructions/000509.htm Assisted delivery with forceps To use the sharing features on ... called vacuum assisted delivery . When is a Forceps Delivery Needed? Even after your cervix is fully dilated ( ...