text-mining assisted regulatory: Topics by WorldWideScience.org

Sample records for text-mining assisted regulatory

Text Mining.

Science.gov (United States)

Trybula, Walter J.

1999-01-01

Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…
Mining biological networks from full-text articles.

Science.gov (United States)

Czarnecki, Jan; Shepherd, Adrian J

2014-01-01

The study of biological networks is playing an increasingly important role in the life sciences. Many different kinds of biological system can be modelled as networks; perhaps the most important examples are protein-protein interaction (PPI) networks, metabolic pathways, gene regulatory networks, and signalling networks. Although much useful information is easily accessible in publicly databases, a lot of extra relevant data lies scattered in numerous published papers. Hence there is a pressing need for automated text-mining methods capable of extracting such information from full-text articles. Here we present practical guidelines for constructing a text-mining pipeline from existing code and software components capable of extracting PPI networks from full-text articles. This approach can be adapted to tackle other types of biological network.
Text-mining-assisted biocuration workflows in Argo

Science.gov (United States)

Rak, Rafal; Batista-Navarro, Riza Theresa; Rowley, Andrew; Carter, Jacob; Ananiadou, Sophia

2014-01-01

Biocuration activities have been broadly categorized into the selection of relevant documents, the annotation of biological concepts of interest and identification of interactions between the concepts. Text mining has been shown to have a potential to significantly reduce the effort of biocurators in all the three activities, and various semi-automatic methodologies have been integrated into curation pipelines to support them. We investigate the suitability of Argo, a workbench for building text-mining solutions with the use of a rich graphical user interface, for the process of biocuration. Central to Argo are customizable workflows that users compose by arranging available elementary analytics to form task-specific processing units. A built-in manual annotation editor is the single most used biocuration tool of the workbench, as it allows users to create annotations directly in text, as well as modify or delete annotations created by automatic processing components. Apart from syntactic and semantic analytics, the ever-growing library of components includes several data readers and consumers that support well-established as well as emerging data interchange formats such as XMI, RDF and BioC, which facilitate the interoperability of Argo with other platforms or resources. To validate the suitability of Argo for curation activities, we participated in the BioCreative IV challenge whose purpose was to evaluate Web-based systems addressing user-defined biocuration tasks. Argo proved to have the edge over other systems in terms of flexibility of defining biocuration tasks. As expected, the versatility of the workbench inevitably lengthened the time the curators spent on learning the system before taking on the task, which may have affected the usability of Argo. The participation in the challenge gave us an opportunity to gather valuable feedback and identify areas of improvement, some of which have already been introduced. Database URL: http://argo.nactem.ac.uk PMID
Biomedical text mining and its applications in cancer research.

Science.gov (United States)

Zhu, Fei; Patumcharoenpol, Preecha; Zhang, Cheng; Yang, Yang; Chan, Jonathan; Meechai, Asawin; Vongsangnak, Wanwipa; Shen, Bairong

2013-04-01

Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over 100years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research. Copyright © 2012 Elsevier Inc. All rights reserved.
tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles.

Science.gov (United States)

Cejuela, Juan Miguel; McQuilton, Peter; Ponting, Laura; Marygold, Steven J; Stefancsik, Raymund; Millburn, Gillian H; Rost, Burkhard

2014-01-01

The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the 'tagtog' system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation. DATABASE URL: www.tagtog.net, www.flybase.org.
Text Mining in Organizational Research.

Science.gov (United States)

Kobayashi, Vladimer B; Mol, Stefan T; Berkers, Hannah A; Kismihók, Gábor; Den Hartog, Deanne N

2018-07-01

Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies.
A Customizable Text Classifier for Text Mining

Directory of Open Access Journals (Sweden)

Yun-liang Zhang

2007-12-01

Full Text Available Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically or be adjusted by user. The user can also control the number of domains chosen and decide the standard with which to choose the texts based on demand and abundance of materials. The performance of the classifier varies with the user's choice.
PubRunner: A light-weight framework for updating text mining results.

Science.gov (United States)

Anekalla, Kishore R; Courneya, J P; Fiorini, Nicolas; Lever, Jake; Muchow, Michael; Busby, Ben

2017-01-01

Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are underused because their results are static and do not reflect the constantly expanding knowledge in the field. In order for biomedical text mining to become an indispensable tool used by researchers, this problem must be addressed. To this end, we present PubRunner, a framework for regularly running text mining tools on the latest publications. PubRunner is lightweight, simple to use, and can be integrated with an existing text mining tool. The workflow involves downloading the latest abstracts from PubMed, executing a user-defined tool, pushing the resulting data to a public FTP or Zenodo dataset, and publicizing the location of these results on the public PubRunner website. We illustrate the use of this tool by re-running the commonly used word2vec tool on the latest PubMed abstracts to generate up-to-date word vector representations for the biomedical domain. This shows a proof of concept that we hope will encourage text mining developers to build tools that truly will aid biologists in exploring the latest publications.
SparkText: Biomedical Text Mining on Big Data Framework.

Science.gov (United States)

Ye, Zhan; Tafti, Ahmad P; He, Karen Y; Wang, Kai; He, Max M

Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
SparkText: Biomedical Text Mining on Big Data Framework.

Directory of Open Access Journals (Sweden)

Zhan Ye

Full Text Available Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment.In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM, and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes.This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
SIAM 2007 Text Mining Competition dataset

Data.gov (United States)

National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...
SparkText: Biomedical Text Mining on Big Data Framework

Science.gov (United States)

He, Karen Y.; Wang, Kai

2016-01-01

Background Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. Results In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. Conclusions This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research. PMID:27685652
Text mining from ontology learning to automated text processing applications

CERN Document Server

Biemann, Chris

2014-01-01

This book comprises a set of articles that specify the methodology of text mining, describe the creation of lexical resources in the framework of text mining and use text mining for various tasks in natural language processing (NLP). The analysis of large amounts of textual data is a prerequisite to build lexical resources such as dictionaries and ontologies and also has direct applications in automated text processing in fields such as history, healthcare and mobile applications, just to name a few. This volume gives an update in terms of the recent gains in text mining methods and reflects
Regulatory challenges of historic uranium mines in Canada

International Nuclear Information System (INIS)

Clement, C.H.; Stenson, R.E.

2002-01-01

The radium and uranium mining industry began in Canada in 1930 with the discovery of the Port Radium deposit in the Northwest Territories. During the 1950s more uranium mines opened across Canada. Most of these mines ceased operation by the end of the 1960s. Some were remediated by their owners, while others were abandoned. The Atomic Energy Control Board (AECB), predecessor to the Canadian Nuclear Safety Commission (CNSC), was created in 1946. However, it was not until the mid-1970s that the AECB took an active role in regulating health, safety and environmental aspects of uranium mining; so many of the older mines have never been licensed. With the coming into force of the Nuclear Safety and Control Act (NSCA) in May 2000, this situation has been reviewed. The NSCA requires a licence for the possession of nuclear substances (including uranium mine tailings), or the decommissioning of nuclear facilities (including uranium mines and mills). Furthermore, governments (federal and provincial) are also subject to the NSCA, a change from the previous legislation. The CNSC has an obligation to assess these sites, regardless of ownership, and to proceed with licensing or other appropriate regulatory action. The CNSC has reviewed the status of the twenty sites in Canada where uranium milling took place historically. Eight are already licensed. Licensing actions are being pursued at the other sites. A review of nearly 100 small uranium mining or exploration sites is also underway to determine the most appropriate regulatory approach. This paper focuses on regulatory issues surrounding the historic mining and milling sites, and the regulatory approach being taken, including licensing provincial and federal government bodies who own some of the sites, and ensuring the safe management of sites that were abandoned. (author)
Text Mining Applications and Theory

CERN Document Server

Berry, Michael W

2010-01-01

Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives. The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrate the use of techniques from machine learning, knowledge discovery, natural language processing and information retrieval to design computational models for automated text analysis and mining. This volume demonstrates how advancements in the fields of applied mathematics, computer science, machine learning
Working with text tools, techniques and approaches for text mining

CERN Document Server

Tourte, Gregory J L

2016-01-01

Text mining tools and technologies have long been a part of the repository world, where they have been applied to a variety of purposes, from pragmatic aims to support tools. Research areas as diverse as biology, chemistry, sociology and criminology have seen effective use made of text mining technologies. Working With Text collects a subset of the best contributions from the 'Working with text: Tools, techniques and approaches for text mining' workshop, alongside contributions from experts in the area. Text mining tools and technologies in support of academic research include supporting research on the basis of a large body of documents, facilitating access to and reuse of extant work, and bridging between the formal academic world and areas such as traditional and social media. Jisc have funded a number of projects, including NaCTem (the National Centre for Text Mining) and the ResDis programme. Contents are developed from workshop submissions and invited contributions, including: Legal considerations in te...
Contextual Text Mining

Science.gov (United States)

Mei, Qiaozhu

2009-01-01

With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…
Text mining for the biocuration workflow.

Science.gov (United States)

Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G

2012-01-01

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.
Text mining for the biocuration workflow

Science.gov (United States)

Hirschman, Lynette; Burns, Gully A. P. C; Krallinger, Martin; Arighi, Cecilia; Cohen, K. Bretonnel; Valencia, Alfonso; Wu, Cathy H.; Chatr-Aryamontri, Andrew; Dowell, Karen G.; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G.

2012-01-01

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129
A STUDY OF TEXT MINING METHODS, APPLICATIONS,AND TECHNIQUES

OpenAIRE

R. Rajamani*1 & S. Saranya2

2017-01-01

Data mining is used to extract useful information from the large amount of data. It is used to implement and solve different types of research problems. The research related areas in data mining are text mining, web mining, image mining, sequential pattern mining, spatial mining, medical mining, multimedia mining, structure mining and graph mining. Text mining also referred to text of data mining, it is also called knowledge discovery in text (KDT) or knowledge of intelligent text analysis. T...

Mining Contratação License in the New Regulatory Framework of Brazilian Mining: Some Notes on the Institutes of Research Authorization and Mining Concession

Directory of Open Access Journals (Sweden)

Adhemar Ronquim Filho

2013-06-01

Full Text Available Given the importance of mining nowadays, Government seeks ways to stimulate its growth, focusing on potentializing research and the advancement of mineral processing, the basic items for speeding up this activity in a profitable way. In this sense, the discussions on the crystallization of a new regulatory framework for the Brazilian mining have been deepened and, despite gathering a significant number of proposals, it does not have a closed text, and, currently, it is far from obtaining an approval or a final word (despite the urgency. However, the analysis of the proposals that have been presented reveals that there is an intention to institute new rules for the modernization of Brazilian mining, and this paper has the purpose of suggesting ways to reconcile conflicts permeated by various dissonant interests that surround the Brazilian mining at this time. It should be emphasized that, given the lack of official disclosure of the amendments proposed, the approach will continue limited to what has been released by MME (Ministry of Mines and Energy and by the studies that have already been presented by experts in the field (connected to government and/or private businesses. It is restricted to discuss changes to be implemented with the new regulatory framework, highlighting points to be observed, and, among the topics that require mandatory update, we can emphasize the changes in the procedures of exploration permits and mining.
77 FR 55430 - Arkansas Regulatory Program and Abandoned Mine Land Reclamation Plan

Science.gov (United States)

2012-09-10

... of its regulatory program and abandoned mine land reclamation plan, make grammatical changes, correct... portions of its regulatory program and abandoned mine land reclamation plan, make grammatical changes... Streams. PART 785--REQUIREMENTS FOR PERMITS FOR SPECIAL CATEGORIES OF MINING 785.13, 785.14, 785.15...
[Text mining, a method for computer-assisted analysis of scientific texts, demonstrated by an analysis of author networks].

Science.gov (United States)

Hahn, P; Dullweber, F; Unglaub, F; Spies, C K

2014-06-01

Searching for relevant publications is becoming more difficult with the increasing number of scientific articles. Text mining as a specific form of computer-based data analysis may be helpful in this context. Highlighting relations between authors and finding relevant publications concerning a specific subject using text analysis programs are illustrated graphically by 2 performed examples. © Georg Thieme Verlag KG Stuttgart · New York.
Biomarker Identification Using Text Mining

Directory of Open Access Journals (Sweden)

Hui Li

2012-01-01

Full Text Available Identifying molecular biomarkers has become one of the important tasks for scientists to assess the different phenotypic states of cells or organisms correlated to the genotypes of diseases from large-scale biological data. In this paper, we proposed a text-mining-based method to discover biomarkers from PubMed. First, we construct a database based on a dictionary, and then we used a finite state machine to identify the biomarkers. Our method of text mining provides a highly reliable approach to discover the biomarkers in the PubMed database.
Chapter 16: text mining for translational bioinformatics.

Science.gov (United States)

Cohen, K Bretonnel; Hunter, Lawrence E

2013-04-01

Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
Frontiers of biomedical text mining: current progress

Science.gov (United States)

Zweigenbaum, Pierre; Demner-Fushman, Dina; Yu, Hong; Cohen, Kevin B.

2008-01-01

It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation-handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a number of problems at the frontiers of biomedical text mining continue to present interesting challenges and opportunities for great improvements and interesting research. In this article we review the current state of the art in biomedical text mining or ‘BioNLP’ in general, focusing primarily on papers published within the past year. PMID:17977867
Decommissioning of uranium mines and mills - Canadian regulatory approach and experience

International Nuclear Information System (INIS)

Whitehead, W.

1986-09-01

At the time of the recent closures of the Agnew Lake, Beaverlodge and Madawaska Mines Limited uranium mining and milling facilities, several relevant regulatory initiatives, including the development of decommissioning criteria, were underway, or contemplated. In the absence of precedents, the regulatory agencies and companies involved adopted approaches to the decommissioning of these facilities that reflected site specific circumstances, federal and provincial regulatory requirements, and generally accepted principles of good engineering practice and environmental protection. This paper summarizes related historical and current regulatory policies, requirements and guidelines; including those implemented at the three decommissioned sites
Text mining resources for the life sciences.

Science.gov (United States)

Przybyła, Piotr; Shardlow, Matthew; Aubin, Sophie; Bossy, Robert; Eckart de Castilho, Richard; Piperidis, Stelios; McNaught, John; Ananiadou, Sophia

2016-01-01

Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable-those that have the crucial ability to share information, enabling smooth integration and reusability. © The Author(s) 2016. Published by Oxford University Press.
Text mining resources for the life sciences

Science.gov (United States)

Shardlow, Matthew; Aubin, Sophie; Bossy, Robert; Eckart de Castilho, Richard; Piperidis, Stelios; McNaught, John; Ananiadou, Sophia

2016-01-01

Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources. In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work. We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools. We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable—those that have the crucial ability to share information, enabling smooth integration and reusability. PMID:27888231
Cultural text mining: using text mining to map the emergence of transnational reference cultures in public media repositories

NARCIS (Netherlands)

Pieters, Toine; Verheul, Jaap

2014-01-01

This paper discusses the research project Translantis, which uses innovative technologies for cultural text mining to analyze large repositories of digitized public media, such as newspapers and journals.1 The Translantis research team uses and develops the text mining tool Texcavator, which is
Validation of an Improved Computer-Assisted Technique for Mining Free-Text Electronic Medical Records.

Science.gov (United States)

Duz, Marco; Marshall, John F; Parkin, Tim

2017-06-29

The use of electronic medical records (EMRs) offers opportunity for clinical epidemiological research. With large EMR databases, automated analysis processes are necessary but require thorough validation before they can be routinely used. The aim of this study was to validate a computer-assisted technique using commercially available content analysis software (SimStat-WordStat v.6 (SS/WS), Provalis Research) for mining free-text EMRs. The dataset used for the validation process included life-long EMRs from 335 patients (17,563 rows of data), selected at random from a larger dataset (141,543 patients, ~2.6 million rows of data) and obtained from 10 equine veterinary practices in the United Kingdom. The ability of the computer-assisted technique to detect rows of data (cases) of colic, renal failure, right dorsal colitis, and non-steroidal anti-inflammatory drug (NSAID) use in the population was compared with manual classification. The first step of the computer-assisted analysis process was the definition of inclusion dictionaries to identify cases, including terms identifying a condition of interest. Words in inclusion dictionaries were selected from the list of all words in the dataset obtained in SS/WS. The second step consisted of defining an exclusion dictionary, including combinations of words to remove cases erroneously classified by the inclusion dictionary alone. The third step was the definition of a reinclusion dictionary to reinclude cases that had been erroneously classified by the exclusion dictionary. Finally, cases obtained by the exclusion dictionary were removed from cases obtained by the inclusion dictionary, and cases from the reinclusion dictionary were subsequently reincluded using Rv3.0.2 (R Foundation for Statistical Computing, Vienna, Austria). Manual analysis was performed as a separate process by a single experienced clinician reading through the dataset once and classifying each row of data based on the interpretation of the free-text
Regulatory philosophy and requirements for radiation control in Canadian uranium mine-mill facilities

International Nuclear Information System (INIS)

Dory, A.B.

1981-10-01

With the point made that radiation exposure is one of the health hazards of uranium mining and accordingly has to be controlled, the Canadian regulatory philosophy is outlined as it pertains to the uranium mining industry. Two extremes in regulatory approach are examined, and the joint regulatory process is explained. Two examples of poor management performance are given, and the role of mine unions in the regulatory process is touched upon. The development of new regulations to cover ventilation and employee training is sketched briefly. The author concludes with a general expression of objectives for the eighties which include improved personal dosimetry
Ion Channel ElectroPhysiology Ontology (ICEPO) - a case study of text mining assisted ontology development.

Science.gov (United States)

Elayavilli, Ravikumar Komandur; Liu, Hongfang

2016-01-01

Computational modeling of biological cascades is of great interest to quantitative biologists. Biomedical text has been a rich source for quantitative information. Gathering quantitative parameters and values from biomedical text is one significant challenge in the early steps of computational modeling as it involves huge manual effort. While automatically extracting such quantitative information from bio-medical text may offer some relief, lack of ontological representation for a subdomain serves as impedance in normalizing textual extractions to a standard representation. This may render textual extractions less meaningful to the domain experts. In this work, we propose a rule-based approach to automatically extract relations involving quantitative data from biomedical text describing ion channel electrophysiology. We further translated the quantitative assertions extracted through text mining to a formal representation that may help in constructing ontology for ion channel events using a rule based approach. We have developed Ion Channel ElectroPhysiology Ontology (ICEPO) by integrating the information represented in closely related ontologies such as, Cell Physiology Ontology (CPO), and Cardiac Electro Physiology Ontology (CPEO) and the knowledge provided by domain experts. The rule-based system achieved an overall F-measure of 68.93% in extracting the quantitative data assertions system on an independently annotated blind data set. We further made an initial attempt in formalizing the quantitative data assertions extracted from the biomedical text into a formal representation that offers potential to facilitate the integration of text mining into ontological workflow, a novel aspect of this study. This work is a case study where we created a platform that provides formal interaction between ontology development and text mining. We have achieved partial success in extracting quantitative assertions from the biomedical text and formalizing them in ontological
Text mining patents for biomedical knowledge.

Science.gov (United States)

Rodriguez-Esteban, Raul; Bundschus, Markus

2016-06-01

Biomedical text mining of scientific knowledge bases, such as Medline, has received much attention in recent years. Given that text mining is able to automatically extract biomedical facts that revolve around entities such as genes, proteins, and drugs, from unstructured text sources, it is seen as a major enabler to foster biomedical research and drug discovery. In contrast to the biomedical literature, research into the mining of biomedical patents has not reached the same level of maturity. Here, we review existing work and highlight the associated technical challenges that emerge from automatically extracting facts from patents. We conclude by outlining potential future directions in this domain that could help drive biomedical research and drug discovery. Copyright © 2016 Elsevier Ltd. All rights reserved.
Text mining for biology--the way forward

DEFF Research Database (Denmark)

Altman, Russ B; Bergman, Casey M; Blake, Judith

2008-01-01

This article collects opinions from leading scientists about how text mining can provide better access to the biological literature, how the scientific community can help with this process, what the next steps are, and what role future BioCreative evaluations can play. The responses identify...... several broad themes, including the possibility of fusing literature and biological databases through text mining; the need for user interfaces tailored to different classes of users and supporting community-based annotation; the importance of scaling text mining technology and inserting it into larger...
Automatic detection of adverse events to predict drug label changes using text and data mining techniques.

Science.gov (United States)

Gurulingappa, Harsha; Toldo, Luca; Rajput, Abdul Mateen; Kors, Jan A; Taweel, Adel; Tayrouz, Yorki

2013-11-01

The aim of this study was to assess the impact of automatically detected adverse event signals from text and open-source data on the prediction of drug label changes. Open-source adverse effect data were collected from FAERS, Yellow Cards and SIDER databases. A shallow linguistic relation extraction system (JSRE) was applied for extraction of adverse effects from MEDLINE case reports. Statistical approach was applied on the extracted datasets for signal detection and subsequent prediction of label changes issued for 29 drugs by the UK Regulatory Authority in 2009. 76% of drug label changes were automatically predicted. Out of these, 6% of drug label changes were detected only by text mining. JSRE enabled precise identification of four adverse drug events from MEDLINE that were undetectable otherwise. Changes in drug labels can be predicted automatically using data and text mining techniques. Text mining technology is mature and well-placed to support the pharmacovigilance tasks. Copyright © 2013 John Wiley & Sons, Ltd.
Unsupervised text mining for assessing and augmenting GWAS results.

Science.gov (United States)

Ailem, Melissa; Role, François; Nadif, Mohamed; Demenais, Florence

2016-04-01

Text mining can assist in the analysis and interpretation of large-scale biomedical data, helping biologists to quickly and cheaply gain confirmation of hypothesized relationships between biological entities. We set this question in the context of genome-wide association studies (GWAS), an actively emerging field that contributed to identify many genes associated with multifactorial diseases. These studies allow to identify groups of genes associated with the same phenotype, but provide no information about the relationships between these genes. Therefore, our objective is to leverage unsupervised text mining techniques using text-based cosine similarity comparisons and clustering applied to candidate and random gene vectors, in order to augment the GWAS results. We propose a generic framework which we used to characterize the relationships between 10 genes reported associated with asthma by a previous GWAS. The results of this experiment showed that the similarities between these 10 genes were significantly stronger than would be expected by chance (one-sided p-value<0.01). The clustering of observed and randomly selected gene also allowed to generate hypotheses about potential functional relationships between these genes and thus contributed to the discovery of new candidate genes for asthma. Copyright © 2016 Elsevier Inc. All rights reserved.
Benchmarking infrastructure for mutation text mining.

Science.gov (United States)

Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

2014-02-25

Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
Benchmarking infrastructure for mutation text mining

Science.gov (United States)

2014-01-01

Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text.

Science.gov (United States)

Garten, Yael; Altman, Russ B

2009-02-05

Pharmacogenomics studies the relationship between genetic variation and the variation in drug response phenotypes. The field is rapidly gaining importance: it promises drugs targeted to particular subpopulations based on genetic background. The pharmacogenomics literature has expanded rapidly, but is dispersed in many journals. It is challenging, therefore, to identify important associations between drugs and molecular entities--particularly genes and gene variants, and thus these critical connections are often lost. Text mining techniques can allow us to convert the free-style text to a computable, searchable format in which pharmacogenomic concepts (such as genes, drugs, polymorphisms, and diseases) are identified, and important links between these concepts are recorded. Availability of full text articles as input into text mining engines is key, as literature abstracts often do not contain sufficient information to identify these pharmacogenomic associations. Thus, building on a tool called Textpresso, we have created the Pharmspresso tool to assist in identifying important pharmacogenomic facts in full text articles. Pharmspresso parses text to find references to human genes, polymorphisms, drugs and diseases and their relationships. It presents these as a series of marked-up text fragments, in which key concepts are visually highlighted. To evaluate Pharmspresso, we used a gold standard of 45 human-curated articles. Pharmspresso identified 78%, 61%, and 74% of target gene, polymorphism, and drug concepts, respectively. Pharmspresso is a text analysis tool that extracts pharmacogenomic concepts from the literature automatically and thus captures our current understanding of gene-drug interactions in a computable form. We have made Pharmspresso available at http://pharmspresso.stanford.edu.

The potential of text mining in data integration and network biology for plant research: a case study on Arabidopsis.

Science.gov (United States)

Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna J; Inzé, Dirk; Van de Peer, Yves

2013-03-01

Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies.
Systematic analysis of molecular mechanisms for HCC metastasis via text mining approach.

Science.gov (United States)

Zhen, Cheng; Zhu, Caizhong; Chen, Haoyang; Xiong, Yiru; Tan, Junyuan; Chen, Dong; Li, Jin

2017-02-21

To systematically explore the molecular mechanism for hepatocellular carcinoma (HCC) metastasis and identify regulatory genes with text mining methods. Genes with highest frequencies and significant pathways related to HCC metastasis were listed. A handful of proteins such as EGFR, MDM2, TP53 and APP, were identified as hub nodes in PPI (protein-protein interaction) network. Compared with unique genes for HBV-HCCs, genes particular to HCV-HCCs were less, but may participate in more extensive signaling processes. VEGFA, PI3KCA, MAPK1, MMP9 and other genes may play important roles in multiple phenotypes of metastasis. Genes in abstracts of HCC-metastasis literatures were identified. Word frequency analysis, KEGG pathway and PPI network analysis were performed. Then co-occurrence analysis between genes and metastasis-related phenotypes were carried out. Text mining is effective for revealing potential regulators or pathways, but the purpose of it should be specific, and the combination of various methods will be more useful.
GPU-Accelerated Text Mining

International Nuclear Information System (INIS)

Cui, X.; Mueller, F.; Zhang, Y.; Potok, Thomas E.

2009-01-01

Accelerating hardware devices represent a novel promise for improving the performance for many problem domains but it is not clear for which domains what accelerators are suitable. While there is no room in general-purpose processor design to significantly increase the processor frequency, developers are instead resorting to multi-core chips duplicating conventional computing capabilities on a single die. Yet, accelerators offer more radical designs with a much higher level of parallelism and novel programming environments. This present work assesses the viability of text mining on CUDA. Text mining is one of the key concepts that has become prominent as an effective means to index the Internet, but its applications range beyond this scope and extend to providing document similarity metrics, the subject of this work. We have developed and optimized text search algorithms for GPUs to exploit their potential for massive data processing. We discuss the algorithmic challenges of parallelization for text search problems on GPUs and demonstrate the potential of these devices in experiments by reporting significant speedups. Our study may be one of the first to assess more complex text search problems for suitability for GPU devices, and it may also be one of the first to exploit and report on atomic instruction usage that have recently become available in NVIDIA devices
Text Mining in Biomedical Domain with Emphasis on Document Clustering.

Science.gov (United States)

Renganathan, Vinaitheerthan

2017-07-01

With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.
Text mining meets workflow: linking U-Compare with Taverna

Science.gov (United States)

Kano, Yoshinobu; Dobson, Paul; Nakanishi, Mio; Tsujii, Jun'ichi; Ananiadou, Sophia

2010-01-01

Summary: Text mining from the biomedical literature is of increasing importance, yet it is not easy for the bioinformatics community to create and run text mining workflows due to the lack of accessibility and interoperability of the text mining resources. The U-Compare system provides a wide range of bio text mining resources in a highly interoperable workflow environment where workflows can very easily be created, executed, evaluated and visualized without coding. We have linked U-Compare to Taverna, a generic workflow system, to expose text mining functionality to the bioinformatics community. Availability: http://u-compare.org/taverna.html, http://u-compare.org Contact: kano@is.s.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20709690
Mining knowledge from text repositories using information extraction ...

Indian Academy of Sciences (India)

Information extraction (IE); text mining; text repositories; knowledge discovery from .... general purpose English words. However ... of precision and recall, as extensive experimentation is required due to lack of public tagged corpora. 4. Mining ...
76 FR 76104 - Arkansas Regulatory Program and Abandoned Mine Land Reclamation Plan

Science.gov (United States)

2011-12-06

... of their regulatory program and abandoned mine land plan, make grammatical changes, correct..., make grammatical changes, correct punctuation, revise dates, and add citations. The Arkansas... SPECIAL CATEGORIES OF MINING 785.14, 785.16, 785.18, and 785.25 Mountaintop Removal Mining; Permits...
Text Mining of Supreme Administrative Court Jurisdictions

OpenAIRE

Feinerer, Ingo; Hornik, Kurt

2007-01-01

Within the last decade text mining, i.e., extracting sensitive information from text corpora, has become a major factor in business intelligence. The automated textual analysis of law corpora is highly valuable because of its impact on a company's legal options and the raw amount of available jurisdiction. The study of supreme court jurisdiction and international law corpora is equally important due to its effects on business sectors. In this paper we use text mining methods to investigate Au...
Text mining in cancer gene and pathway prioritization.

Science.gov (United States)

Luo, Yuan; Riedlinger, Gregory; Szolovits, Peter

2014-01-01

Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.
The Potential of Text Mining in Data Integration and Network Biology for Plant Research: A Case Study on Arabidopsis[C][W

Science.gov (United States)

Van Landeghem, Sofie; De Bodt, Stefanie; Drebert, Zuzanna J.; Inzé, Dirk; Van de Peer, Yves

2013-01-01

Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein–protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies. PMID:23532071
Text mining and visualization case studies using open-source tools

CERN Document Server

Chisholm, Andrew

2016-01-01

Text Mining and Visualization: Case Studies Using Open-Source Tools provides an introduction to text mining using some of the most popular and powerful open-source tools: KNIME, RapidMiner, Weka, R, and Python. The contributors-all highly experienced with text mining and open-source software-explain how text data are gathered and processed from a wide variety of sources, including books, server access logs, websites, social media sites, and message boards. Each chapter presents a case study that you can follow as part of a step-by-step, reproducible example. You can also easily apply and extend the techniques to other problems. All the examples are available on a supplementary website. The book shows you how to exploit your text data, offering successful application examples and blueprints for you to tackle your text mining tasks and benefit from open and freely available tools. It gets you up to date on the latest and most powerful tools, the data mining process, and specific text mining activities.
Citation Mining: Integrating Text Mining and Bibliometrics for Research User Profiling.

Science.gov (United States)

Kostoff, Ronald N.; del Rio, J. Antonio; Humenik, James A.; Garcia, Esther Ofilia; Ramirez, Ana Maria

2001-01-01

Discusses the importance of identifying the users and impact of research, and describes an approach for identifying the pathways through which research can impact other research, technology development, and applications. Describes a study that used citation mining, an integration of citation bibliometrics and text mining, on articles from the…
Financial Statement Fraud Detection using Text Mining

OpenAIRE

Rajan Gupta; Nasib Singh Gill

2013-01-01

Data mining techniques have been used enormously by the researchers’ community in detecting financial statement fraud. Most of the research in this direction has used the numbers (quantitative information) i.e. financial ratios present in the financial statements for detecting fraud. There is very little or no research on the analysis of text such as auditor’s comments or notes present in published reports. In this study we propose a text mining approach for detecting financial statement frau...
30 CFR 795.11 - Assistance funding.

Science.gov (United States)

2010-07-01

... Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR SMALL OPERATOR ASSISTANCE PERMANENT REGULATORY PROGRAM-SMALL OPERATOR ASSISTANCE PROGRAM § 795.11 Assistance... eligible small operators if available funds are less than those required to provide the services pursuant...
Application of text mining in the biomedical domain.

Science.gov (United States)

Fleuren, Wilco W M; Alkema, Wynand

2015-03-01

In recent years the amount of experimental data that is produced in biomedical research and the number of papers that are being published in this field have grown rapidly. In order to keep up to date with developments in their field of interest and to interpret the outcome of experiments in light of all available literature, researchers turn more and more to the use of automated literature mining. As a consequence, text mining tools have evolved considerably in number and quality and nowadays can be used to address a variety of research questions ranging from de novo drug target discovery to enhanced biological interpretation of the results from high throughput experiments. In this paper we introduce the most important techniques that are used for a text mining and give an overview of the text mining tools that are currently being used and the type of problems they are typically applied for. Copyright © 2015 Elsevier Inc. All rights reserved.
Instream sand and gravel mining: Environmental issues and regulatory process in the United States

Science.gov (United States)

Meador, M.R.; Layher, A.O.

1998-01-01

Sand and gravel are widely used throughout the U.S. construction industry, but their extraction can significantly affect the physical, chemical, and biological characteristics of mined streams. Fisheries biologists often find themselves involved in the complex environmental and regulatory issues related to instream sand and gravel mining. This paper provides an overview of information presented in a symposium held at the 1997 midyear meeting of the Southern Division of the American Fisheries Society in San Antonio, Texas, to discuss environmental issues and regulatory procedures related to instream mining. Conclusions from the symposium suggest that complex physicochemical and biotic responses to disturbance such as channel incision and alteration of riparian vegetation ultimately determine the effects of instream mining. An understanding of geomorphic processes can provide insight into the effects of mining operations on stream function, and multidisciplinary empirical studies are needed to determine the relative effects of mining versus other natural and human-induced stream alterations. Mining regulations often result in a confusing regulatory process complicated, for example, by the role of the U.S. Army Corps of Engineers, which has undergone numerous changes and remains unclear. Dialogue among scientists, miners, and regulators can provide an important first step toward developing a plan that integrates biology and politics to protect aquatic resources.
Text mining in livestock animal science: introducing the potential of text mining to animal sciences.

Science.gov (United States)

Sahadevan, S; Hofmann-Apitius, M; Schellander, K; Tesfaye, D; Fluck, J; Friedrich, C M

2012-10-01

In biological research, establishing the prior art by searching and collecting information already present in the domain has equal importance as the experiments done. To obtain a complete overview about the relevant knowledge, researchers mainly rely on 2 major information sources: i) various biological databases and ii) scientific publications in the field. The major difference between the 2 information sources is that information from databases is available, typically well structured and condensed. The information content in scientific literature is vastly unstructured; that is, dispersed among the many different sections of scientific text. The traditional method of information extraction from scientific literature occurs by generating a list of relevant publications in the field of interest and manually scanning these texts for relevant information, which is very time consuming. It is more than likely that in using this "classical" approach the researcher misses some relevant information mentioned in the literature or has to go through biological databases to extract further information. Text mining and named entity recognition methods have already been used in human genomics and related fields as a solution to this problem. These methods can process and extract information from large volumes of scientific text. Text mining is defined as the automatic extraction of previously unknown and potentially useful information from text. Named entity recognition (NER) is defined as the method of identifying named entities (names of real world objects; for example, gene/protein names, drugs, enzymes) in text. In animal sciences, text mining and related methods have been briefly used in murine genomics and associated fields, leaving behind other fields of animal sciences, such as livestock genomics. The aim of this work was to develop an information retrieval platform in the livestock domain focusing on livestock publications and the recognition of relevant data from
OntoGene web services for biomedical text mining.

Science.gov (United States)

Rinaldi, Fabio; Clematide, Simon; Marques, Hernani; Ellendorff, Tilia; Romacker, Martin; Rodriguez-Esteban, Raul

2014-01-01

Text mining services are rapidly becoming a crucial component of various knowledge management pipelines, for example in the process of database curation, or for exploration and enrichment of biomedical data within the pharmaceutical industry. Traditional architectures, based on monolithic applications, do not offer sufficient flexibility for a wide range of use case scenarios, and therefore open architectures, as provided by web services, are attracting increased interest. We present an approach towards providing advanced text mining capabilities through web services, using a recently proposed standard for textual data interchange (BioC). The web services leverage a state-of-the-art platform for text mining (OntoGene) which has been tested in several community-organized evaluation challenges,with top ranked results in several of them.
The regulatory process for uranium mines in Canada -general overview and radiation health and safety in uranium mine-mill facilities

International Nuclear Information System (INIS)

Dory, A.B.

1982-01-01

This presentation is divided into two main sections. In the first, the author explores the issues of radiation and tailings disposal, and then examines the Canadian nuclear regulatory process from the point of view of jurisdiction, objectives, philosophy and mechanics. The compliance inspection program is outlined, and the author discussed the relationships between the AECB and other regulatory agencies, the public and uranium mine-mill workers. The section concludes with an examination of the stance of the medical profession on nuclear issues. In part two, the radiological hazards for uranium miners are examined: radon daughters, gamma radiation, thoron daughters and uranium dust. The author touches on new regulations being drafted, the assessment of past exposures in mine atmospheres, and the regulatory approach at the surface exploration stage. The presentation concludes with the author's brief observations on the findings of other uranium mining inquiries and on future requirements in the industry's interests
ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials

Science.gov (United States)

2012-01-01

Clinical trials are mandatory protocols describing medical research on humans and among the most valuable sources of medical practice evidence. Searching for trials relevant to some query is laborious due to the immense number of existing protocols. Apart from search, writing new trials includes composing detailed eligibility criteria, which might be time-consuming, especially for new researchers. In this paper we present ASCOT, an efficient search application customised for clinical trials. ASCOT uses text mining and data mining methods to enrich clinical trials with metadata, that in turn serve as effective tools to narrow down search. In addition, ASCOT integrates a component for recommending eligibility criteria based on a set of selected protocols. PMID:22595088

Text mining of web-based medical content

CERN Document Server

Neustein, Amy

2014-01-01

Text Mining of Web-Based Medical Content examines web mining for extracting useful information that can be used for treating and monitoring the healthcare of patients. This work provides methodological approaches to designing mapping tools that exploit data found in social media postings. Specific linguistic features of medical postings are analyzed vis-a-vis available data extraction tools for culling useful information.
MeSHmap: a text mining tool for MEDLINE.

OpenAIRE

Srinivasan, P.

2001-01-01

Our research goal is to explore text mining from the metadata included in MEDLINE documents. We present MeSHmap our prototype text mining system that exploits the MeSH indexing accompanying MEDLINE records. MeSHmap supports searches via PubMed followed by user driven exploration of the MeSH terms and subheadings in the retrieved set. The potential of the system goes beyond text retrieval. It may also be used to compare entities of the same type such as pairs of drugs or pairs of procedures et...
Text mining with R a tidy approach

CERN Document Server

Silge, Julia

2017-01-01

Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You'll learn how tidytext and other tidy tools in R can make text analysis easier and more effective. The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You'll also learn how to integrate natural language processing (NLP) into effective workflows. Practical code examples and data explorations will help you generate real insights from literature, news, and social media. Learn how to apply the tidy text format to NLP Use sentiment analysis to mine the emotional content of text Identify a document's most important terms with frequency measurements E...
Text mining in the classification of digital documents

Directory of Open Access Journals (Sweden)

Marcial Contreras Barrera

2016-11-01

Full Text Available Objective: Develop an automated classifier for the classification of bibliographic material by means of the text mining. Methodology: The text mining is used for the development of the classifier, based on a method of type supervised, conformed by two phases; learning and recognition, in the learning phase, the classifier learns patterns across the analysis of bibliographical records, of the classification Z, belonging to library science, information sciences and information resources, recovered from the database LIBRUNAM, in this phase is obtained the classifier capable of recognizing different subclasses (LC. In the recognition phase the classifier is validated and evaluates across classification tests, for this end bibliographical records of the classification Z are taken randomly, classified by a cataloguer and processed by the automated classifier, in order to obtain the precision of the automated classifier. Results: The application of the text mining achieved the development of the automated classifier, through the method classifying documents supervised type. The precision of the classifier was calculated doing the comparison among the assigned topics manually and automated obtaining 75.70% of precision. Conclusions: The application of text mining facilitated the creation of automated classifier, allowing to obtain useful technology for the classification of bibliographical material with the aim of improving and speed up the process of organizing digital documents.
Romanian regulatory framework for uranium mining and milling (present and future)

International Nuclear Information System (INIS)

Rodna, A.L.; Dumitrescu, N.

2002-01-01

In Romania, all operations in the nuclear field, including uranium mining and milling, are regulated by Law no. 111/1996 (republished in 1998), regarding the safe conduct of nuclear activities. These activities can be performed only on the basis of an authorization released by the national regulatory authority, i.e. the National Commission for Nuclear Activities Control. The specific requirements which must be carried out by the owner of an operating licence for a uranium mining and milling operation are stipulated by the Republican Nuclear Safety Norms for Geological Research, Mining and Milling of Nuclear Raw Materials. These regulatory requirements have been in force since 1975. The regulatory norms include provisions that the effective dose limit for workers should not exceed 50 mSv/year and also that liquid effluents released into surface waters must have a content of natural radioactive elements that meets the standards for drinking water. The norms do not contain provisions concerning the conditions under which the mining sites and the uranium processing facilities can be shut down and decommissioned. The norms also do not contain requirements regarding either the rehabilitation of environments affected by abandoned mining and milling activities, nor criteria for the release of the rehabilitated sites for alternative uses. To implement the provisions of Council Directive 96/29 EURATOM in Romania, new Fundamental Radiological Protection Norms have been approved and will soon be published in the 'Monitorul Official' (Official Gazette of Romania). One of the main provisions of these norms is the reduction of the effective dose limit for the workers to 20 mSv/year. Changes in the Republican Nuclear Safety Norms for Geological Research, Mining and Milling of Nuclear Raw Materials, are also planned; these changes will be consistent with the Fundamental Radiological Protection Norms. To cover existing gaps, the new norms for uranium mining and milling will include
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.

Science.gov (United States)

Arighi, Cecilia N; Carterette, Ben; Cohen, K Bretonnel; Krallinger, Martin; Wilbur, W John; Fey, Petra; Dodson, Robert; Cooper, Laurel; Van Slyke, Ceri E; Dahdul, Wasila; Mabee, Paula; Li, Donghui; Harris, Bethany; Gillespie, Marc; Jimenez, Silvia; Roberts, Phoebe; Matthews, Lisa; Becker, Kevin; Drabkin, Harold; Bello, Susan; Licata, Luana; Chatr-aryamontri, Andrew; Schaeffer, Mary L; Park, Julie; Haendel, Melissa; Van Auken, Kimberly; Li, Yuling; Chan, Juancarlos; Muller, Hans-Michael; Cui, Hong; Balhoff, James P; Chi-Yang Wu, Johnny; Lu, Zhiyong; Wei, Chih-Hsuan; Tudor, Catalina O; Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar; Cejuela, Juan Miguel; Dubey, Pratibha; Wu, Cathy

2013-01-01

In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and
Public reactions to e-cigarette regulations on Twitter: a text mining analysis.

Science.gov (United States)

Lazard, Allison J; Wilcox, Gary B; Tuttle, Hannah M; Glowacki, Elizabeth M; Pikowski, Jessica

2017-12-01

In May 2016, the Food and Drug Administration (FDA) issued a final rule that deemed e-cigarettes to be within their regulatory authority as a tobacco product. News and opinions about the regulation were shared on social media platforms, such as Twitter, which can play an important role in shaping the public's attitudes. We analysed information shared on Twitter for insights into initial public reactions. A text mining approach was used to uncover important topics among reactions to the e-cigarette regulations on Twitter. SAS Text Miner V.12.1 software was used for descriptive text mining to uncover the primary topics from tweets collected from May 1 to May 17 2016 using NUVI software to gather the data. A total of nine topics were generated. These topics reveal initial reactions to whether the FDA's e-cigarette regulations will benefit or harm public health, how the regulations will impact the emerging e-cigarette market and efforts to share the news. The topics were dominated by negative or mixed reactions. In the days following the FDA's announcement of the new deeming regulations, the public reaction on Twitter was largely negative. Public health advocates should consider using social media outlets to better communicate the policy's intentions, reach and potential impact for public good to create a more balanced conversation. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Text mining for traditional Chinese medical knowledge discovery: a survey.

Science.gov (United States)

Zhou, Xuezhong; Peng, Yonghong; Liu, Baoyan

2010-08-01

Extracting meaningful information and knowledge from free text is the subject of considerable research interest in the machine learning and data mining fields. Text data mining (or text mining) has become one of the most active research sub-fields in data mining. Significant developments in the area of biomedical text mining during the past years have demonstrated its great promise for supporting scientists in developing novel hypotheses and new knowledge from the biomedical literature. Traditional Chinese medicine (TCM) provides a distinct methodology with which to view human life. It is one of the most complete and distinguished traditional medicines with a history of several thousand years of studying and practicing the diagnosis and treatment of human disease. It has been shown that the TCM knowledge obtained from clinical practice has become a significant complementary source of information for modern biomedical sciences. TCM literature obtained from the historical period and from modern clinical studies has recently been transformed into digital data in the form of relational databases or text documents, which provide an effective platform for information sharing and retrieval. This motivates and facilitates research and development into knowledge discovery approaches and to modernize TCM. In order to contribute to this still growing field, this paper presents (1) a comparative introduction to TCM and modern biomedicine, (2) a survey of the related information sources of TCM, (3) a review and discussion of the state of the art and the development of text mining techniques with applications to TCM, (4) a discussion of the research issues around TCM text mining and its future directions. Copyright 2010 Elsevier Inc. All rights reserved.
PathText: a text mining integrator for biological pathway visualizations

Science.gov (United States)

Kemper, Brian; Matsuzaki, Takuya; Matsuoka, Yukiko; Tsuruoka, Yoshimasa; Kitano, Hiroaki; Ananiadou, Sophia; Tsujii, Jun'ichi

2010-01-01

Motivation: Metabolic and signaling pathways are an increasingly important part of organizing knowledge in systems biology. They serve to integrate collective interpretations of facts scattered throughout literature. Biologists construct a pathway by reading a large number of articles and interpreting them as a consistent network, but most of the models constructed currently lack direct links to those articles. Biologists who want to check the original articles have to spend substantial amounts of time to collect relevant articles and identify the sections relevant to the pathway. Furthermore, with the scientific literature expanding by several thousand papers per week, keeping a model relevant requires a continuous curation effort. In this article, we present a system designed to integrate a pathway visualizer, text mining systems and annotation tools into a seamless environment. This will enable biologists to freely move between parts of a pathway and relevant sections of articles, as well as identify relevant papers from large text bases. The system, PathText, is developed by Systems Biology Institute, Okinawa Institute of Science and Technology, National Centre for Text Mining (University of Manchester) and the University of Tokyo, and is being used by groups of biologists from these locations. Contact: brian@monrovian.com. PMID:20529930
Aspects of Text Mining From Computational Semiotics to Systemic Functional Hypertexts

Directory of Open Access Journals (Sweden)

Alexander Mehler

2001-05-01

Full Text Available The significance of natural language texts as the prime information structure for the management and dissemination of knowledge in organisations is still increasing. Making relevant documents available depending on varying tasks in different contexts is of primary importance for any efficient task completion. Implementing this demand requires the content based processing of texts, which enables to reconstruct or, if necessary, to explore the relationship of task, context and document. Text mining is a technology that is suitable for solving problems of this kind. In the following, semiotic aspects of text mining are investigated. Based on the primary object of text mining - natural language lexis - the specific complexity of this class of signs is outlined and requirements for the implementation of text mining procedures are derived. This is done with reference to text linkage introduced as a special task in text mining. Text linkage refers to the exploration of implicit, content based relations of texts (and their annotation as typed links in corpora possibly organised as hypertexts. In this context, the term systemic functional hypertext is introduced, which distinguishes genre and register layers for the management of links in a poly-level hypertext system.
Biomedical text mining for research rigor and integrity: tasks, challenges, directions.

Science.gov (United States)

Kilicoglu, Halil

2017-06-13

An estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted because of problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the manifestation of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote responsible research practices, providing significant benefits for the biomedical research enterprise. Published by Oxford University Press 2017. This work is written by a US Government employee and is in the public domain in the US.
Mining environmental policy: comparing Indonesia and the USA

Energy Technology Data Exchange (ETDEWEB)

Michael S. Hamilton [University of Southern Maine in Portland, MA (United States)

2005-12-15

Illustrated by a detailed comparative examination of mining regulations and environmental impact assessment (EIA) in the USA (the second largest producer of coal in the world) and Indonesia (the eighth largest and most rapidly growing), this book argues that the degree of policy integration often determines the success or failure in controlling environmental effects of mining operations. Comparison of surface mining regulation in the two countries provides some stark contrasts, some surprising results concerning the diffusion of policy innovations from one country to another, and instances of both policy success and failure. The book provides significant new insights into international relations and comparative environmental policy, particularly as they affect rainforests and biodiversity. It also suggests that if mining environmental policy were to be effectively implemented, the environmental degradation caused need not be permanent. Contents: Introduction: mining environmental policy implementation in two countries; Mining regulatory policy in Indonesia; Mining regulatory policy in the United States; Environmental assessment policy in two countries; Lost profits, royalties, and environmental quality; Developing mining environmental policy in Indonesia; Improving Indonesian regulatory program; Development of institutional capacity; Motivations for assistance; Conclusions.
United Nations programme for the assistance in Uruguay mining exploration

International Nuclear Information System (INIS)

1976-01-01

The Uruguay government asked for the United Nations for the development of technical assistance programme in geological considerations of the Valentines iron deposits. This agreement was signed as Mining prospect ion assistance in Uruguay.
Adverse Event extraction from Structured Product Labels using the Event-based Text-mining of Health Electronic Records (ETHER)system.

Science.gov (United States)

Pandey, Abhishek; Kreimeyer, Kory; Foster, Matthew; Botsis, Taxiarchis; Dang, Oanh; Ly, Thomas; Wang, Wei; Forshee, Richard

2018-01-01

Structured Product Labels follow an XML-based document markup standard approved by the Health Level Seven organization and adopted by the US Food and Drug Administration as a mechanism for exchanging medical products information. Their current organization makes their secondary use rather challenging. We used the Side Effect Resource database and DailyMed to generate a comparison dataset of 1159 Structured Product Labels. We processed the Adverse Reaction section of these Structured Product Labels with the Event-based Text-mining of Health Electronic Records system and evaluated its ability to extract and encode Adverse Event terms to Medical Dictionary for Regulatory Activities Preferred Terms. A small sample of 100 labels was then selected for further analysis. Of the 100 labels, Event-based Text-mining of Health Electronic Records achieved a precision and recall of 81 percent and 92 percent, respectively. This study demonstrated Event-based Text-mining of Health Electronic Record's ability to extract and encode Adverse Event terms from Structured Product Labels which may potentially support multiple pharmacoepidemiological tasks.
Text mining a self-report back-translation.

Science.gov (United States)

Blanch, Angel; Aluja, Anton

2016-06-01

There are several recommendations about the routine to undertake when back translating self-report instruments in cross-cultural research. However, text mining methods have been generally ignored within this field. This work describes a text mining innovative application useful to adapt a personality questionnaire to 12 different languages. The method is divided in 3 different stages, a descriptive analysis of the available back-translated instrument versions, a dissimilarity assessment between the source language instrument and the 12 back-translations, and an item assessment of item meaning equivalence. The suggested method contributes to improve the back-translation process of self-report instruments for cross-cultural research in 2 significant intertwined ways. First, it defines a systematic approach to the back translation issue, allowing for a more orderly and informed evaluation concerning the equivalence of different versions of the same instrument in different languages. Second, it provides more accurate instrument back-translations, which has direct implications for the reliability and validity of the instrument's test scores when used in different cultures/languages. In addition, this procedure can be extended to the back-translation of self-reports measuring psychological constructs in clinical assessment. Future research works could refine the suggested methodology and use additional available text mining tools. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Text Mining of Journal Articles for Sleep Disorder Terminologies.

Directory of Open Access Journals (Sweden)

Calvin Lam

Full Text Available Research on publication trends in journal articles on sleep disorders (SDs and the associated methodologies by using text mining has been limited. The present study involved text mining for terms to determine the publication trends in sleep-related journal articles published during 2000-2013 and to identify associations between SD and methodology terms as well as conducting statistical analyses of the text mining findings.SD and methodology terms were extracted from 3,720 sleep-related journal articles in the PubMed database by using MetaMap. The extracted data set was analyzed using hierarchical cluster analyses and adjusted logistic regression models to investigate publication trends and associations between SD and methodology terms.MetaMap had a text mining precision, recall, and false positive rate of 0.70, 0.77, and 11.51%, respectively. The most common SD term was breathing-related sleep disorder, whereas narcolepsy was the least common. Cluster analyses showed similar methodology clusters for each SD term, except narcolepsy. The logistic regression models showed an increasing prevalence of insomnia, parasomnia, and other sleep disorders but a decreasing prevalence of breathing-related sleep disorder during 2000-2013. Different SD terms were positively associated with different methodology terms regarding research design terms, measure terms, and analysis terms.Insomnia-, parasomnia-, and other sleep disorder-related articles showed an increasing publication trend, whereas those related to breathing-related sleep disorder showed a decreasing trend. Furthermore, experimental studies more commonly focused on hypersomnia and other SDs and less commonly on insomnia, breathing-related sleep disorder, narcolepsy, and parasomnia. Thus, text mining may facilitate the exploration of the publication trends in SDs and the associated methodologies.
Science and Technology Text Mining Basic Concepts

National Research Council Canada - National Science Library

Losiewicz, Paul

2003-01-01

...). It then presents some of the most widely used data and text mining techniques, including clustering and classification methods, such as nearest neighbor, relational learning models, and genetic...
A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.

Science.gov (United States)

Westergaard, David; Stærfeldt, Hans-Henrik; Tønsberg, Christian; Jensen, Lars Juhl; Brunak, Søren

2018-02-01

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.

Science.gov (United States)

Singhal, Ayush; Leaman, Robert; Catlett, Natalie; Lemberger, Thomas; McEntyre, Johanna; Polson, Shawn; Xenarios, Ioannis; Arighi, Cecilia; Lu, Zhiyong

2016-01-01

Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system 'accuracy' remains a challenge and identify several additional common difficulties and potential research directions including (i) the 'scalability' issue due to the increasing need of mining information from millions of full-text articles, (ii) the 'interoperability' issue of integrating various text-mining systems into existing curation workflows and (iii) the 'reusability' issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
pubmed.mineR: An R package with text-mining algorithms to ...

Indian Academy of Sciences (India)

2015-09-29

Sep 29, 2015 ... using text-mining algorithms for biomedical research pur- poses. ... studies are described to illustrate some potential uses of ... This is the most applied task. ... other alphabets (for example, Greek alphabets) and hyphens.

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

Science.gov (United States)

Westergaard, David; Stærfeldt, Hans-Henrik

2018-01-01

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823–2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein–protein, disease–gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only. PMID:29447159
EnvMine: A text-mining system for the automatic extraction of contextual information

Directory of Open Access Journals (Sweden)

de Lorenzo Victor

2010-06-01

Full Text Available Abstract Background For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles. So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations from textual sources of any kind. Results EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings. Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude, thus allowing the calculation of distance between the individual locations. Conclusion EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical
Text Mining of Journal Articles for Sleep Disorder Terminologies.

Science.gov (United States)

Lam, Calvin; Lai, Fu-Chih; Wang, Chia-Hui; Lai, Mei-Hsin; Hsu, Nanly; Chung, Min-Huey

2016-01-01

Research on publication trends in journal articles on sleep disorders (SDs) and the associated methodologies by using text mining has been limited. The present study involved text mining for terms to determine the publication trends in sleep-related journal articles published during 2000-2013 and to identify associations between SD and methodology terms as well as conducting statistical analyses of the text mining findings. SD and methodology terms were extracted from 3,720 sleep-related journal articles in the PubMed database by using MetaMap. The extracted data set was analyzed using hierarchical cluster analyses and adjusted logistic regression models to investigate publication trends and associations between SD and methodology terms. MetaMap had a text mining precision, recall, and false positive rate of 0.70, 0.77, and 11.51%, respectively. The most common SD term was breathing-related sleep disorder, whereas narcolepsy was the least common. Cluster analyses showed similar methodology clusters for each SD term, except narcolepsy. The logistic regression models showed an increasing prevalence of insomnia, parasomnia, and other sleep disorders but a decreasing prevalence of breathing-related sleep disorder during 2000-2013. Different SD terms were positively associated with different methodology terms regarding research design terms, measure terms, and analysis terms. Insomnia-, parasomnia-, and other sleep disorder-related articles showed an increasing publication trend, whereas those related to breathing-related sleep disorder showed a decreasing trend. Furthermore, experimental studies more commonly focused on hypersomnia and other SDs and less commonly on insomnia, breathing-related sleep disorder, narcolepsy, and parasomnia. Thus, text mining may facilitate the exploration of the publication trends in SDs and the associated methodologies.
Text mining improves prediction of protein functional sites.

Directory of Open Access Journals (Sweden)

Karin M Verspoor

Full Text Available We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites. The structure analysis was carried out using Dynamics Perturbation Analysis (DPA, which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions.
A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

DEFF Research Database (Denmark)

Westergaard, David; Stærfeldt, Hans Henrik; Tønsberg, Christian

2018-01-01

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15...... subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full...... million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein...
Imitating manual curation of text-mined facts in biomedicine.

Directory of Open Access Journals (Sweden)

Raul Rodriguez-Esteban

2006-09-01

Full Text Available Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted of individual facts--to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations, we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95. Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.
A Survey of Text Mining in Social Media: Facebook and Twitter Perspectives

Directory of Open Access Journals (Sweden)

Said A. Salloum

2017-01-01

Full Text Available Text mining has become one of the trendy fields that has been incorporated in several research fields such as computational linguistics, Information Retrieval (IR and data mining. Natural Language Processing (NLP techniques were used to extract knowledge from the textual text that is written by human beings. Text mining reads an unstructured form of data to provide meaningful information patterns in a shortest time period. Social networking sites are a great source of communication as most of the people in today’s world use these sites in their daily lives to keep connected to each other. It becomes a common practice to not write a sentence with correct grammar and spelling. This practice may lead to different kinds of ambiguities like lexical, syntactic, and semantic and due to this type of unclear data, it is hard to find out the actual data order. Accordingly, we are conducting an investigation with the aim of looking for different text mining methods to get various textual orders on social media websites. This survey aims to describe how studies in social media have used text analytics and text mining techniques for the purpose of identifying the key themes in the data. This survey focused on analyzing the text mining studies related to Facebook and Twitter; the two dominant social media in the world. Results of this survey can serve as the baselines for future text mining research.
Text Mining Improves Prediction of Protein Functional Sites

Science.gov (United States)

Cohn, Judith D.; Ravikumar, Komandur E.

2012-01-01

We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388
DISEASES: text mining and data integration of disease-gene associations.

Science.gov (United States)

Pletscher-Frankild, Sune; Pallejà, Albert; Tsafou, Kalliopi; Binder, Janos X; Jensen, Lars Juhl

2015-03-01

Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Regulatory harmonization of the Saskatchewan uranium mines

International Nuclear Information System (INIS)

Forbes, R.; Moulding, T.; Alderman, G.

2006-01-01

The uranium mining industry in Saskatchewan produces approximately 30% of the world's production of uranium. The industry is regulated by federal and provincial regulators. The Canadian Nuclear Safety Commission is the principal federal regulator. The principal Saskatchewan provincial regulators are Saskatchewan Environment for provincial environmental regulations and Saskatchewan Labour for occupational health and safety regulations. In the past, mine and mill operators have requested harmonization in areas such as inspections and reporting requirements from the regulators. On February 14, 2003, Saskatchewan Environment, Saskatchewan Labour and the Canadian Nuclear Safety Commission signed a historical agreement for federal/provincial co-operation called the Canadian Nuclear Safety Commission - Saskatchewan Administrative Agreement for the Regulation of Health, Safety and the Environment at Saskatchewan Uranium Mines and Mills. This initiative responds to a recommendation made by the Joint Federal-Provincial Panel on Uranium Mining Developments in Northern Saskatchewan in 1997 and lays the groundwork to co-ordinate and harmonize their respective regulatory regimes. The implementation of the Agreement has been very successful. This paper will address the content of the Agreement including the commitments, the deliverables and the expectations for a harmonized compliance program, harmonized reporting, and the review of harmonized assessment and licensing processes as well as possible referencing of Saskatchewan Environment and Saskatchewan Labour regulations in the Nuclear Safety and Control Act. The management and implementation process will also be discussed including the schedule, stakeholder communication, the results to date and the lessons learned. (author)
Automated detection of follow-up appointments using text mining of discharge records.

Science.gov (United States)

Ruud, Kari L; Johnson, Matthew G; Liesinger, Juliette T; Grafft, Carrie A; Naessens, James M

2010-06-01

To determine whether text mining can accurately detect specific follow-up appointment criteria in free-text hospital discharge records. Cross-sectional study. Mayo Clinic Rochester hospitals. Inpatients discharged from general medicine services in 2006 (n = 6481). Textual hospital dismissal summaries were manually reviewed to determine whether the records contained specific follow-up appointment arrangement elements: date, time and either physician or location for an appointment. The data set was evaluated for the same criteria using SAS Text Miner software. The two assessments were compared to determine the accuracy of text mining for detecting records containing follow-up appointment arrangements. Agreement of text-mined appointment findings with gold standard (manual abstraction) including sensitivity, specificity, positive predictive and negative predictive values (PPV and NPV). About 55.2% (3576) of discharge records contained all criteria for follow-up appointment arrangements according to the manual review, 3.2% (113) of which were missed through text mining. Text mining incorrectly identified 3.7% (107) follow-up appointments that were not considered valid through manual review. Therefore, the text mining analysis concurred with the manual review in 96.6% of the appointment findings. Overall sensitivity and specificity were 96.8 and 96.3%, respectively; and PPV and NPV were 97.0 and 96.1%, respectively. of individual appointment criteria resulted in accuracy rates of 93.5% for date, 97.4% for time, 97.5% for physician and 82.9% for location. Text mining of unstructured hospital dismissal summaries can accurately detect documentation of follow-up appointment arrangement elements, thus saving considerable resources for performance assessment and quality-related research.
Monitoring interaction and collective text production through text mining

Directory of Open Access Journals (Sweden)

Macedo, Alexandra Lorandi

2014-04-01

Full Text Available This article presents the Concepts Network tool, developed using text mining technology. The main objective of this tool is to extract and relate terms of greatest incidence from a text and exhibit the results in the form of a graph. The Network was implemented in the Collective Text Editor (CTE which is an online tool that allows the production of texts in synchronized or non-synchronized forms. This article describes the application of the Network both in texts produced collectively and texts produced in a forum. The purpose of the tool is to offer support to the teacher in managing the high volume of data generated in the process of interaction amongst students and in the construction of the text. Specifically, the aim is to facilitate the teacher’s job by allowing him/her to process data in a shorter time than is currently demanded. The results suggest that the Concepts Network can aid the teacher, as it provides indicators of the quality of the text produced. Moreover, messages posted in forums can be analyzed without their content necessarily having to be pre-read.
The legal and regulatory framework relative to safety and environment in the uranium mines in Niger

International Nuclear Information System (INIS)

Mahamane, S.

2001-01-01

The mining sector holds an important position in Niger economy. Considerable funds have been invested for the promotion, exploration and exploitation of mineral resources since the colonial period. This has resulted in the discovery of numerous deposits among which are those of uranium. Today, uranium represents more than 3/4 of Niger export revenues. The mining sector is supervised by the Ministry of Mines and Energy. The Ministry applies the mining policy as defined by the government. It elaborates legislative and regulatory texts and sees to their implementation. Regarding uranium, mining activities have been governed since 1961 by various orientation laws and implementation decrees. However, to face up to the harmful consequences on national economy of successive drops of price and sales of its major export product, and taking into account the new international requirements relating to economy globalization and sustainable development, Niger set up a diversification strategy of its mining productions as part of which a new mining code particularly incentive has been established in 1993. The new mining code provides significant advantages to investors. These advantages insure them a great cost effectiveness of their investments in Niger and easy and less onerous respect of regulations regarding safety and protection of environment. Tremendous efforts have been, thus, provided by the IAEA, the Ministry of Mines and Energy and the uranium companies for an optimal protection of workers and the public, especially against the hazards of ionizing radiations. This will to improve the situation has resulted in the adoption of several laws and their application decrees as well as various sectorial laws designed by various Ministry departments concerned with environmental issues and risks prevention. Among these texts are the renewal of the order No 31 M/MH which has defined since 1979 the main axis of the Niger regulations as regards to radioprotection and the design of
DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

Science.gov (United States)

Mahmood, A S M Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K

2016-01-01

The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.
DrugQuest - a text mining workflow for drug association discovery.

Science.gov (United States)

Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Vizirianakis, Ioannis S; Iliopoulos, Ioannis

2016-06-06

Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .
VisualUrText: A Text Analytics Tool for Unstructured Textual Data

Science.gov (United States)

Zainol, Zuraini; Jaymes, Mohd T. H.; Nohuddin, Puteri N. E.

2018-05-01

The growing amount of unstructured text over Internet is tremendous. Text repositories come from Web 2.0, business intelligence and social networking applications. It is also believed that 80-90% of future growth data is available in the form of unstructured text databases that may potentially contain interesting patterns and trends. Text Mining is well known technique for discovering interesting patterns and trends which are non-trivial knowledge from massive unstructured text data. Text Mining covers multidisciplinary fields involving information retrieval (IR), text analysis, natural language processing (NLP), data mining, machine learning statistics and computational linguistics. This paper discusses the development of text analytics tool that is proficient in extracting, processing, analyzing the unstructured text data and visualizing cleaned text data into multiple forms such as Document Term Matrix (DTM), Frequency Graph, Network Analysis Graph, Word Cloud and Dendogram. This tool, VisualUrText, is developed to assist students and researchers for extracting interesting patterns and trends in document analyses.
CONAN : Text Mining in the Biomedical Domain

NARCIS (Netherlands)

Malik, R.

2006-01-01

This thesis is about Text Mining. Extracting important information from literature. In the last years, the number of biomedical articles and journals is growing exponentially. Scientists might not find the information they want because of the large number of publications. Therefore a system was
Mining highly stressed areas, part 2.

CSIR Research Space (South Africa)

Johnson, R

1995-12-01

Full Text Available A questionnaire related to mining at great depth and in very high stress conditions has been completed with the assistance of mine rock mechanics personnel on over twenty mines in all mining districts, and covering all deep level mines...
Opinion Mining in Latvian Text Using Semantic Polarity Analysis and Machine Learning Approach

Directory of Open Access Journals (Sweden)

Gatis Špats

2016-07-01

Full Text Available In this paper we demonstrate approaches for opinion mining in Latvian text. Authors have applied, combined and extended results of several previous studies and public resources to perform opinion mining in Latvian text using two approaches, namely, semantic polarity analysis and machine learning. One of the most significant constraints that make application of opinion mining for written content classification in Latvian text challenging is the limited publicly available text corpora for classifier training. We have joined several sources and created a publically available extended lexicon. Our results are comparable to or outperform current achievements in opinion mining in Latvian. Experiments show that lexicon-based methods provide more accurate opinion mining than the application of Naive Bayes machine learning classifier on Latvian tweets. Methods used during this study could be further extended using human annotators, unsupervised machine learning and bootstrapping to create larger corpora of classified text.
Beyond accuracy: creating interoperable and scalable text-mining web services.

Science.gov (United States)

Wei, Chih-Hsuan; Leaman, Robert; Lu, Zhiyong

2016-06-15

The biomedical literature is a knowledge-rich resource and an important foundation for future research. With over 24 million articles in PubMed and an increasing growth rate, research in automated text processing is becoming increasingly important. We report here our recently developed web-based text mining services for biomedical concept recognition and normalization. Unlike most text-mining software tools, our web services integrate several state-of-the-art entity tagging systems (DNorm, GNormPlus, SR4GN, tmChem and tmVar) and offer a batch-processing mode able to process arbitrary text input (e.g. scholarly publications, patents and medical records) in multiple formats (e.g. BioC). We support multiple standards to make our service interoperable and allow simpler integration with other text-processing pipelines. To maximize scalability, we have preprocessed all PubMed articles, and use a computer cluster for processing large requests of arbitrary text. Our text-mining web service is freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl : Zhiyong.Lu@nih.gov. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

Regulatory aspects of radiological hazards in mines and works

International Nuclear Information System (INIS)

Metcalf, P.E.; Guy, M.S.C.; Van Sittert, J.M.O.

1987-01-01

The mining and processing of ores containing uranium and thorium can give rise to exposure to ionizing radiation via various pathways to both persons employed in mines and works and to members of the general public living in the vicinity of such facilities. Such exposures to radiation will not give rise to unacceptable levels of risk provided the source of such exposure are identified, quantified and controlled. In order to establish the necessary regulatory framework to ensure such identification, quantification and control, regulations are currently being promulgated under the Mines and Works Act to require that activities involving nuclear hazard material resulting from the mining and processing of materials containing uranium and thorium will be subject to such conditions that will ensure compliance with a system of radiation dose limitation laid down. It is proposed that conditions be established for each mine and works in the form of a Code of Practice specific to the facility, the provisions of which will be commensurate with the potential magnitude of the particular radiological hazards prevailing at that facility. Generic guidelines will be made available to enable suitable Codes of Practice to be drawn up. This paper discusses the various radiological hazards that may have to be considered, the scope and content of the generic guidelines and the dose limitation system with which it will be necessary to comply. 5 refs., 2 tabs
Using ontology network structure in text mining.

Science.gov (United States)

Berndt, Donald J; McCart, James A; Luther, Stephen L

2010-11-13

Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.
A tm Plug-In for Distributed Text Mining in R

Directory of Open Access Journals (Sweden)

Stefan Theussl

2012-11-01

Full Text Available R has gained explicit text mining support with the tm package enabling statisticians to answer many interesting research questions via statistical analysis or modeling of (text corpora. However, we typically face two challenges when analyzing large corpora: (1 the amount of data to be processed in a single machine is usually limited by the available main memory (i.e., RAM, and (2 the more data to be analyzed the higher the need for efficient procedures for calculating valuable results. Fortunately, adequate programming models like MapReduce facilitate parallelization of text mining tasks and allow for processing data sets beyond what would fit into memory by using a distributed file system possibly spanning over several machines, e.g., in a cluster of workstations. In this paper we present a plug-in package to tm called tm.plugin.dc implementing a distributed corpus class which can take advantage of the Hadoop MapReduce library for large scale text mining tasks. We show on the basis of an application in culturomics that we can efficiently handle data sets of significant size.
A text-mining system for extracting metabolic reactions from full-text articles.

Science.gov (United States)

Czarnecki, Jan; Nobeli, Irene; Smith, Adrian M; Shepherd, Adrian J

2012-07-23

Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway - metabolic pathways - has been largely neglected.Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein-protein interactions. When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task. We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein-protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed.
Text Mining for Protein Docking.

Directory of Open Access Journals (Sweden)

Varsha D Badal

2015-12-01

Full Text Available The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking. Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu. The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound
Mining Gene Regulatory Networks by Neural Modeling of Expression Time-Series.

Science.gov (United States)

Rubiolo, Mariano; Milone, Diego H; Stegmayer, Georgina

2015-01-01

Discovering gene regulatory networks from data is one of the most studied topics in recent years. Neural networks can be successfully used to infer an underlying gene network by modeling expression profiles as times series. This work proposes a novel method based on a pool of neural networks for obtaining a gene regulatory network from a gene expression dataset. They are used for modeling each possible interaction between pairs of genes in the dataset, and a set of mining rules is applied to accurately detect the subjacent relations among genes. The results obtained on artificial and real datasets confirm the method effectiveness for discovering regulatory networks from a proper modeling of the temporal dynamics of gene expression profiles.
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.

Science.gov (United States)

Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping

2018-04-27

A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Empirical advances with text mining of electronic health records.

Science.gov (United States)

Delespierre, T; Denormandie, P; Bar-Hen, A; Josseran, L

2017-08-22

Korian is a private group specializing in medical accommodations for elderly and dependent people. A professional data warehouse (DWH) established in 2010 hosts all of the residents' data. Inside this information system (IS), clinical narratives (CNs) were used only by medical staff as a residents' care linking tool. The objective of this study was to show that, through qualitative and quantitative textual analysis of a relatively small physiotherapy and well-defined CN sample, it was possible to build a physiotherapy corpus and, through this process, generate a new body of knowledge by adding relevant information to describe the residents' care and lives. Meaningful words were extracted through Standard Query Language (SQL) with the LIKE function and wildcards to perform pattern matching, followed by text mining and a word cloud using R® packages. Another step involved principal components and multiple correspondence analyses, plus clustering on the same residents' sample as well as on other health data using a health model measuring the residents' care level needs. By combining these techniques, physiotherapy treatments could be characterized by a list of constructed keywords, and the residents' health characteristics were built. Feeding defects or health outlier groups could be detected, physiotherapy residents' data and their health data were matched, and differences in health situations showed qualitative and quantitative differences in physiotherapy narratives. This textual experiment using a textual process in two stages showed that text mining and data mining techniques provide convenient tools to improve residents' health and quality of care by adding new, simple, useable data to the electronic health record (EHR). When used with a normalized physiotherapy problem list, text mining through information extraction (IE), named entity recognition (NER) and data mining (DM) can provide a real advantage to describe health care, adding new medical material and
Assimilating Text-Mining & Bio-Informatics Tools to Analyze Cellulase structures

Science.gov (United States)

Satyasree, K. P. N. V., Dr; Lalitha Kumari, B., Dr; Jyotsna Devi, K. S. N. V.; Choudri, S. M. Roy; Pratap Joshi, K.

2017-08-01

Text-mining is one of the best potential way of automatically extracting information from the huge biological literature. To exploit its prospective, the knowledge encrypted in the text should be converted to some semantic representation such as entities and relations, which could be analyzed by machines. But large-scale practical systems for this purpose are rare. But text mining could be helpful for generating or validating predictions. Cellulases have abundant applications in various industries. Cellulose degrading enzymes are cellulases and the same producing bacteria - Bacillus subtilis & fungus Pseudomonas putida were isolated from top soil of Guntur Dt. A.P. India. Absolute cultures were conserved on potato dextrose agar medium for molecular studies. In this paper, we presented how well the text mining concepts can be used to analyze cellulase producing bacteria and fungi, their comparative structures are also studied with the aid of well-establised, high quality standard bioinformatic tools such as Bioedit, Swissport, Protparam, EMBOSSwin with which a complete data on Cellulases like structure, constituents of the enzyme has been obtained.
BioCreative Workshops for DOE Genome Sciences: Text Mining for Metagenomics

Energy Technology Data Exchange (ETDEWEB)

Wu, Cathy H. [Univ. of Delaware, Newark, DE (United States). Center for Bioinformatics and Computational Biology; Hirschman, Lynette [The MITRE Corporation, Bedford, MA (United States)

2016-10-29

The objective of this project was to host BioCreative workshops to define and develop text mining tasks to meet the needs of the Genome Sciences community, focusing on metadata information extraction in metagenomics. Following the successful introduction of metagenomics at the BioCreative IV workshop, members of the metagenomics community and BioCreative communities continued discussion to identify candidate topics for a BioCreative metagenomics track for BioCreative V. Of particular interest was the capture of environmental and isolation source information from text. The outcome was to form a “community of interest” around work on the interactive EXTRACT system, which supported interactive tagging of environmental and species data. This experiment is included in the BioCreative V virtual issue of Database. In addition, there was broad participation by members of the metagenomics community in the panels held at BioCreative V, leading to valuable exchanges between the text mining developers and members of the metagenomics research community. These exchanges are reflected in a number of the overview and perspective pieces also being captured in the BioCreative V virtual issue. Overall, this conversation has exposed the metagenomics researchers to the possibilities of text mining, and educated the text mining developers to the specific needs of the metagenomics community.
Text mining for adverse drug events: the promise, challenges, and state of the art.

Science.gov (United States)

Harpaz, Rave; Callahan, Alison; Tamang, Suzanne; Low, Yen; Odgers, David; Finlayson, Sam; Jung, Kenneth; LePendu, Paea; Shah, Nigam H

2014-10-01

Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.
The Role of Text Mining in Export Control

Energy Technology Data Exchange (ETDEWEB)

Tae, Jae-woong; Son, Choul-woong; Shin, Dong-hoon [Korea Institute of Nuclear Nonproliferation and Control, Daejeon (Korea, Republic of)

2015-10-15

Korean government provides classification services to exporters. It is simple to copy technology such as documents and drawings. Moreover, it is also easy that new technology derived from the existing technology. The diversity of technology makes classification difficult because the boundary between strategic and nonstrategic technology is unclear and ambiguous. Reviewers should consider previous classification cases enough. However, the increase of the classification cases prevent consistent classifications. This made another innovative and effective approaches necessary. IXCRS (Intelligent Export Control Review System) is proposed to coincide with demands. IXCRS consists of and expert system, a semantic searching system, a full text retrieval system, and image retrieval system and a document retrieval system. It is the aim of the present paper to observe the document retrieval system based on text mining and to discuss how to utilize the system. This study has demonstrated how text mining technique can be applied to export control. The document retrieval system supports reviewers to treat previous classification cases effectively. Especially, it is highly probable that similarity data will contribute to specify classification criterion. However, an analysis of the system showed a number of problems that remain to be explored such as a multilanguage problem and an inclusion relationship problem. Further research should be directed to solve problems and to apply more data mining techniques so that the system should be used as one of useful tools for export control.
The Role of Text Mining in Export Control

International Nuclear Information System (INIS)

Tae, Jae-woong; Son, Choul-woong; Shin, Dong-hoon

2015-01-01

Korean government provides classification services to exporters. It is simple to copy technology such as documents and drawings. Moreover, it is also easy that new technology derived from the existing technology. The diversity of technology makes classification difficult because the boundary between strategic and nonstrategic technology is unclear and ambiguous. Reviewers should consider previous classification cases enough. However, the increase of the classification cases prevent consistent classifications. This made another innovative and effective approaches necessary. IXCRS (Intelligent Export Control Review System) is proposed to coincide with demands. IXCRS consists of and expert system, a semantic searching system, a full text retrieval system, and image retrieval system and a document retrieval system. It is the aim of the present paper to observe the document retrieval system based on text mining and to discuss how to utilize the system. This study has demonstrated how text mining technique can be applied to export control. The document retrieval system supports reviewers to treat previous classification cases effectively. Especially, it is highly probable that similarity data will contribute to specify classification criterion. However, an analysis of the system showed a number of problems that remain to be explored such as a multilanguage problem and an inclusion relationship problem. Further research should be directed to solve problems and to apply more data mining techniques so that the system should be used as one of useful tools for export control
A Text-Mining Framework for Supporting Systematic Reviews.

Science.gov (United States)

Li, Dingcheng; Wang, Zhen; Wang, Liwei; Sohn, Sunghwan; Shen, Feichen; Murad, Mohammad Hassan; Liu, Hongfang

2016-11-01

Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.
pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.

Science.gov (United States)

Rani, Jyoti; Shah, A B Rauf; Ramachandran, Srinivasan

2015-10-01

The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, 'Evolving role of diabetes educators', 'Cancer risk assessment' and 'Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.rproject. org/web/packages/pubmed.mineR.
Supporting the education evidence portal via text mining

Science.gov (United States)

Ananiadou, Sophia; Thompson, Paul; Thomas, James; Mu, Tingting; Oliver, Sandy; Rickinson, Mark; Sasaki, Yutaka; Weissenbacher, Davy; McNaught, John

2010-01-01

The UK Education Evidence Portal (eep) provides a single, searchable, point of access to the contents of the websites of 33 organizations relating to education, with the aim of revolutionizing work practices for the education community. Use of the portal alleviates the need to spend time searching multiple resources to find relevant information. However, the combined content of the websites of interest is still very large (over 500 000 documents and growing). This means that searches using the portal can produce very large numbers of hits. As users often have limited time, they would benefit from enhanced methods of performing searches and viewing results, allowing them to drill down to information of interest more efficiently, without having to sift through potentially long lists of irrelevant documents. The Joint Information Systems Committee (JISC)-funded ASSIST project has produced a prototype web interface to demonstrate the applicability of integrating a number of text-mining tools and methods into the eep, to facilitate an enhanced searching, browsing and document-viewing experience. New features include automatic classification of documents according to a taxonomy, automatic clustering of search results according to similar document content, and automatic identification and highlighting of key terms within documents. PMID:20643679
Text mining by Tsallis entropy

Science.gov (United States)

Jamaati, Maryam; Mehri, Ali

2018-01-01

Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.
Text Mining to Support Gene Ontology Curation and Vice Versa.

Science.gov (United States)

Ruch, Patrick

2017-01-01

In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate.
Regulatory philosophy and requirements for radiation control in Canadian uranium mine-mill facilities

International Nuclear Information System (INIS)

Dory, A.B.

1981-10-01

The approach the Canadian Atomic Energy Control Board takes in licensing uranium mine/mill facilities is based on a minimum of rigidly set regulatory requirements. The regulations state only the basic objectives: the obligation to acquire a licence, some administrative and reporting requirements, and exposure limits. The regulations are supported by a set of regulatory guides. The operator always has the option of following different procedures if he can demonstrate that they will produce the same or better results. Good relationships exist between the AECB and mine management as well as trade unions. Under this approach, however, it is difficult to take action against uncooperative parties. The Board has decided that a somewhat more formalized system is necessary. New regulations are being drafted, giving more detailed licensing and administrative requirements and covering the areas of ventilation and worker and supervisor education more thoroughly
Ultrasound-assisted extraction for total sulphur measurement in mine tailings

International Nuclear Information System (INIS)

Khan, Adnan Hossain; Shang, Julie Q.; Alam, Raquibul

2012-01-01

Highlights: ► We develop a total sulphur measuring procedure of mine tailings. ► Ultrasound is used in the sample pre-treatment process. ► Full factorial design is applied to identify the best level of effecting factors. - Abstract: A sample preparation method for percentage recovery of total sulphur (%S) in reactive mine tailings based on ultrasound-assisted digestion (USAD) and inductively coupled plasma-optical emission spectroscopy (ICP-OES) was developed. The influence of various methodological factors was screened by employing a two-level and three-factor (2 3 ) full factorial design and using KZK-1, a sericite schist certified reference material (CRM), to find the optimal combination of studied factors and %S. Factors such as the sonication time, temperature and acid combination were studied, with the best result identified as 20 min of sonication, 80 °C temperature and 1 ml of HNO 3 :1 ml of HCl, which can achieve 100% recovery for the selected CRM. Subsequently a fraction of the 2 3 full factorial design was applied to mine tailings. The percentage relative standard deviation (%RSD) for the ultrasound method is less than 3.0% for CRM and less than 6% for the mine tailings. The investigated method was verified by X-ray diffraction analysis. The USAD method compared favorably with existing methods such as hot plate assisted digestion method, X-ray fluorescence and LECO™-CNS method.

PubstractHelper: A Web-based Text-Mining Tool for Marking Sentences in Abstracts from PubMed Using Multiple User-Defined Keywords.

Science.gov (United States)

Chen, Chou-Cheng; Ho, Chung-Liang

2014-01-01

While a huge amount of information about biological literature can be obtained by searching the PubMed database, reading through all the titles and abstracts resulting from such a search for useful information is inefficient. Text mining makes it possible to increase this efficiency. Some websites use text mining to gather information from the PubMed database; however, they are database-oriented, using pre-defined search keywords while lacking a query interface for user-defined search inputs. We present the PubMed Abstract Reading Helper (PubstractHelper) website which combines text mining and reading assistance for an efficient PubMed search. PubstractHelper can accept a maximum of ten groups of keywords, within each group containing up to ten keywords. The principle behind the text-mining function of PubstractHelper is that keywords contained in the same sentence are likely to be related. PubstractHelper highlights sentences with co-occurring keywords in different colors. The user can download the PMID and the abstracts with color markings to be reviewed later. The PubstractHelper website can help users to identify relevant publications based on the presence of related keywords, which should be a handy tool for their research. http://bio.yungyun.com.tw/ATM/PubstractHelper.aspx and http://holab.med.ncku.edu.tw/ATM/PubstractHelper.aspx.
Advances in Text Mining and Visualization for Precision Medicine.

Science.gov (United States)

Gonzalez-Hernandez, Graciela; Sarker, Abeed; O'Connor, Karen; Greene, Casey; Liu, Hongfang

2018-01-01

According to the National Institutes of Health (NIH), precision medicine is "an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person." Although the text mining community has explored this realm for some years, the official endorsement and funding launched in 2015 with the Precision Medicine Initiative are beginning to bear fruit. This session sought to elicit participation of researchers with strong background in text mining and/or visualization who are actively collaborating with bench scientists and clinicians for the deployment of integrative approaches in precision medicine that could impact scientific discovery and advance the vision of precision medicine as a universal, accessible approach at the point of care.
Application of text mining for customer evaluations in commercial banking

Science.gov (United States)

Tan, Jing; Du, Xiaojiang; Hao, Pengpeng; Wang, Yanbo J.

2015-07-01

Nowadays customer attrition is increasingly serious in commercial banks. To combat this problem roundly, mining customer evaluation texts is as important as mining customer structured data. In order to extract hidden information from customer evaluations, Textual Feature Selection, Classification and Association Rule Mining are necessary techniques. This paper presents all three techniques by using Chinese Word Segmentation, C5.0 and Apriori, and a set of experiments were run based on a collection of real textual data that includes 823 customer evaluations taken from a Chinese commercial bank. Results, consequent solutions, some advice for the commercial bank are given in this paper.
Using Text Mining to Characterize Online Discussion Facilitation

Science.gov (United States)

Ming, Norma; Baumer, Eric

2011-01-01

Facilitating class discussions effectively is a critical yet challenging component of instruction, particularly in online environments where student and faculty interaction is limited. Our goals in this research were to identify facilitation strategies that encourage productive discussion, and to explore text mining techniques that can help…
Practical text mining and statistical analysis for non-structured text data applications

CERN Document Server

Miner, Gary; Hill, Thomas; Nisbet, Robert; Delen, Dursun

2012-01-01

The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase d
Using text-mining techniques in electronic patient records to identify ADRs from medicine use

DEFF Research Database (Denmark)

Warrer, Pernille; Hansen, Ebba Holme; Jensen, Lars Juhl

2012-01-01

This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We...... included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs......, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text...
Biomedical hypothesis generation by text mining and gene prioritization.

Science.gov (United States)

Petric, Ingrid; Ligeti, Balazs; Gyorffy, Balazs; Pongor, Sandor

2014-01-01

Text mining methods can facilitate the generation of biomedical hypotheses by suggesting novel associations between diseases and genes. Previously, we developed a rare-term model called RaJoLink (Petric et al, J. Biomed. Inform. 42(2): 219-227, 2009) in which hypotheses are formulated on the basis of terms rarely associated with a target domain. Since many current medical hypotheses are formulated in terms of molecular entities and molecular mechanisms, here we extend the methodology to proteins and genes, using a standardized vocabulary as well as a gene/protein network model. The proposed enhanced RaJoLink rare-term model combines text mining and gene prioritization approaches. Its utility is illustrated by finding known as well as potential gene-disease associations in ovarian cancer using MEDLINE abstracts and the STRING database.
Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial

DEFF Research Database (Denmark)

Debortoli, Stefan; Müller, Oliver; Junglas, Iris

2016-01-01

, such as manual coding. Yet, the size of text data setsobtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challengesencountered when applying automated text-mining techniques in information systems research. In particular, weshowcase the use of probabilistic...... researchers,this tutorial provides some guidance for conducting text mining studies on their own and for evaluating the quality ofothers.......t is estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video);and much of it is expressed in rich and ambiguous natural language. Traditionally, the analysis of natural languagehas prompted the use of qualitative data analysis approaches...
Negation scope and spelling variation for text-mining of Danish electronic patient records

DEFF Research Database (Denmark)

Thomas, Cecilia Engel; Jensen, Peter Bjødstrup; Werge, Thomas

2014-01-01

Electronic patient records are a potentially rich data source for knowledge extraction in biomedical research. Here we present a method based on the ICD10 system for text-mining of Danish health records. We have evaluated how adding functionalities to a baseline text-mining tool affected...
Knowledge based word-concept model estimation and refinement for biomedical text mining.

Science.gov (United States)

Jimeno Yepes, Antonio; Berlanga, Rafael

2015-02-01

Text mining of scientific literature has been essential for setting up large public biomedical databases, which are being widely used by the research community. In the biomedical domain, the existence of a large number of terminological resources and knowledge bases (KB) has enabled a myriad of machine learning methods for different text mining related tasks. Unfortunately, KBs have not been devised for text mining tasks but for human interpretation, thus performance of KB-based methods is usually lower when compared to supervised machine learning methods. The disadvantage of supervised methods though is they require labeled training data and therefore not useful for large scale biomedical text mining systems. KB-based methods do not have this limitation. In this paper, we describe a novel method to generate word-concept probabilities from a KB, which can serve as a basis for several text mining tasks. This method not only takes into account the underlying patterns within the descriptions contained in the KB but also those in texts available from large unlabeled corpora such as MEDLINE. The parameters of the model have been estimated without training data. Patterns from MEDLINE have been built using MetaMap for entity recognition and related using co-occurrences. The word-concept probabilities were evaluated on the task of word sense disambiguation (WSD). The results showed that our method obtained a higher degree of accuracy than other state-of-the-art approaches when evaluated on the MSH WSD data set. We also evaluated our method on the task of document ranking using MEDLINE citations. These results also showed an increase in performance over existing baseline retrieval approaches. Copyright © 2014 Elsevier Inc. All rights reserved.
Ultrasound-assisted extraction for total sulphur measurement in mine tailings

Energy Technology Data Exchange (ETDEWEB)

Khan, Adnan Hossain, E-mail: ad_li2@yahoo.com [Department of Civil and Environmental Engineering, University of Western Ontario (Canada); Shang, Julie Q.; Alam, Raquibul [Department of Civil and Environmental Engineering, University of Western Ontario (Canada)

2012-10-15

Highlights: Black-Right-Pointing-Pointer We develop a total sulphur measuring procedure of mine tailings. Black-Right-Pointing-Pointer Ultrasound is used in the sample pre-treatment process. Black-Right-Pointing-Pointer Full factorial design is applied to identify the best level of effecting factors. - Abstract: A sample preparation method for percentage recovery of total sulphur (%S) in reactive mine tailings based on ultrasound-assisted digestion (USAD) and inductively coupled plasma-optical emission spectroscopy (ICP-OES) was developed. The influence of various methodological factors was screened by employing a two-level and three-factor (2{sup 3}) full factorial design and using KZK-1, a sericite schist certified reference material (CRM), to find the optimal combination of studied factors and %S. Factors such as the sonication time, temperature and acid combination were studied, with the best result identified as 20 min of sonication, 80 Degree-Sign C temperature and 1 ml of HNO{sub 3}:1 ml of HCl, which can achieve 100% recovery for the selected CRM. Subsequently a fraction of the 2{sup 3} full factorial design was applied to mine tailings. The percentage relative standard deviation (%RSD) for the ultrasound method is less than 3.0% for CRM and less than 6% for the mine tailings. The investigated method was verified by X-ray diffraction analysis. The USAD method compared favorably with existing methods such as hot plate assisted digestion method, X-ray fluorescence and LECO Trade-Mark-Sign -CNS method.
Mining protein function from text using term-based support vector machines

Science.gov (United States)

Rice, Simon B; Nenadic, Goran; Stapley, Benjamin J

2005-01-01

Background Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents. Results The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent. Conclusion A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2. PMID:15960835
Hot complaint intelligent classification based on text mining

Directory of Open Access Journals (Sweden)

XIA Haifeng

2013-10-01

Full Text Available The complaint recognizer system plays an important role in making sure the correct classification of the hot complaint,improving the service quantity of telecommunications industry.The customers’ complaint in telecommunications industry has its special particularity which should be done in limited time,which cause the error in classification of hot complaint.The paper presents a model of complaint hot intelligent classification based on text mining,which can classify the hot complaint in the correct level of the complaint navigation.The examples show that the model can be efficient to classify the text of the complaint.
Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies.

Science.gov (United States)

Cohen, Raphael; Elhadad, Michael; Elhadad, Noémie

2013-01-16

The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicians typically copy and paste information from previous notes when documenting a current patient encounter. Thus, within a longitudinal patient record, one expects to observe heavy redundancy. In this paper, we ask three research questions: (i) How can redundancy be quantified in large-scale text corpora? (ii) Conventional wisdom is that larger corpora yield better results in text mining. But how does the observed EHR redundancy affect text mining? Does such redundancy introduce a bias that distorts learned models? Or does the redundancy introduce benefits by highlighting stable and important subsets of the corpus? (iii) How can one mitigate the impact of redundancy on text mining? We analyze a large-scale EHR corpus and quantify redundancy both in terms of word and semantic concept repetition. We observe redundancy levels of about 30% and non-standard distribution of both words and concepts. We measure the impact of redundancy on two standard text-mining applications: collocation identification and topic modeling. We compare the results of these methods on synthetic data with controlled levels of redundancy and observe significant performance variation. Finally, we compare two mitigation strategies to avoid redundancy-induced bias: (i) a baseline strategy, keeping only the last note for each patient in the corpus; (ii) removing redundant notes with an efficient fingerprinting-based algorithm. (a)For text mining, preprocessing the EHR corpus with fingerprinting yields
Ultrasound-assisted extraction for total sulphur measurement in mine tailings.

Science.gov (United States)

Khan, Adnan Hossain; Shang, Julie Q; Alam, Raquibul

2012-10-15

A sample preparation method for percentage recovery of total sulphur (%S) in reactive mine tailings based on ultrasound-assisted digestion (USAD) and inductively coupled plasma-optical emission spectroscopy (ICP-OES) was developed. The influence of various methodological factors was screened by employing a two-level and three-factor (2(3)) full factorial design and using KZK-1, a sericite schist certified reference material (CRM), to find the optimal combination of studied factors and %S. Factors such as the sonication time, temperature and acid combination were studied, with the best result identified as 20 min of sonication, 80°C temperature and 1 ml of HNO(3):1 ml of HCl, which can achieve 100% recovery for the selected CRM. Subsequently a fraction of the 2(3) full factorial design was applied to mine tailings. The percentage relative standard deviation (%RSD) for the ultrasound method is less than 3.0% for CRM and less than 6% for the mine tailings. The investigated method was verified by X-ray diffraction analysis. The USAD method compared favorably with existing methods such as hot plate assisted digestion method, X-ray fluorescence and LECO™-CNS method. Copyright © 2012 Elsevier B.V. All rights reserved.
Text-mining analysis of mHealth research

Science.gov (United States)

Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

2017-01-01

In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from “mobile phone” to “smartphone” and from “applications” to “apps”. Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical
Text-mining analysis of mHealth research.

Science.gov (United States)

Ozaydin, Bunyamin; Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

2017-01-01

In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from "mobile phone" to "smartphone" and from "applications" to "apps". Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions
Using text-mining techniques in electronic patient records to identify ADRs from medicine use.

Science.gov (United States)

Warrer, Pernille; Hansen, Ebba Holme; Juhl-Jensen, Lars; Aagaard, Lise

2012-05-01

This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs. © 2011 The Authors. British Journal of Clinical Pharmacology © 2011 The British Pharmacological Society.
From university research to innovation Detecting knowledge transfer via text mining

DEFF Research Database (Denmark)

Woltmann, Sabrina; Clemmensen, Line Katrine Harder; Alkærsig, Lars

2016-01-01

and indicators such as patents, collaborative publications and license agreements, to assess the contribution to the socioeconomic surrounding of universities. In this study, we present an extension of the current empirical framework by applying new computational methods, namely text mining and pattern...... associated the former with the latter to obtain insights into possible text and semantic relatedness. The text mining methods are extrapolating the correlations, semantic patterns and content comparison of the two corpora to define the document relatedness. We expect the development of a novel tool using...... recognition. Text samples for this purpose can include files containing social media contents, company websites and annual reports. The empirical focus in the present study is on the technical sciences and in particular on the case of the Technical University of Denmark (DTU). We generated two independent...
Vaccine adverse event text mining system for extracting features from vaccine safety reports.

Science.gov (United States)

Botsis, Taxiarchis; Buttolph, Thomas; Nguyen, Michael D; Winiecki, Scott; Woo, Emily Jane; Ball, Robert

2012-01-01

To develop and evaluate a text mining system for extracting key clinical features from vaccine adverse event reporting system (VAERS) narratives to aid in the automated review of adverse event reports. Based upon clinical significance to VAERS reviewing physicians, we defined the primary (diagnosis and cause of death) and secondary features (eg, symptoms) for extraction. We built a novel vaccine adverse event text mining (VaeTM) system based on a semantic text mining strategy. The performance of VaeTM was evaluated using a total of 300 VAERS reports in three sequential evaluations of 100 reports each. Moreover, we evaluated the VaeTM contribution to case classification; an information retrieval-based approach was used for the identification of anaphylaxis cases in a set of reports and was compared with two other methods: a dedicated text classifier and an online tool. The performance metrics of VaeTM were text mining metrics: recall, precision and F-measure. We also conducted a qualitative difference analysis and calculated sensitivity and specificity for classification of anaphylaxis cases based on the above three approaches. VaeTM performed best in extracting diagnosis, second level diagnosis, drug, vaccine, and lot number features (lenient F-measure in the third evaluation: 0.897, 0.817, 0.858, 0.874, and 0.914, respectively). In terms of case classification, high sensitivity was achieved (83.1%); this was equal and better compared to the text classifier (83.1%) and the online tool (40.7%), respectively. Our VaeTM implementation of a semantic text mining strategy shows promise in providing accurate and efficient extraction of key features from VAERS narratives.

Methods for Mining and Summarizing Text Conversations

CERN Document Server

Carenini, Giuseppe; Murray, Gabriel

2011-01-01

Due to the Internet Revolution, human conversational data -- in written forms -- are accumulating at a phenomenal rate. At the same time, improvements in speech technology enable many spoken conversations to be transcribed. Individuals and organizations engage in email exchanges, face-to-face meetings, blogging, texting and other social media activities. The advances in natural language processing provide ample opportunities for these "informal documents" to be analyzed and mined, thus creating numerous new and valuable applications. This book presents a set of computational methods
Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.

Science.gov (United States)

Lu, Zhiyong; Hirschman, Lynette

2012-01-01

Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. DATABASE URL: http://www.biocreative.org/tasks/bc-workshop-2012/workflow/.
Data Mining Mining Data: MSHA Enforcement Efforts, Underground Coal Mine Safety, and New Health Implications

OpenAIRE

Kniesner, Thomas J.; Leeth, John D.

2003-01-01

Studies of industrial safety regulations, OSHA in particular, often find little effect on worker safety. Critics of the regulatory approach argue that safety standards have little to do with industrial injuries, and defenders of the regulatory approach cite infrequent inspections and low penalties for violating safety standards. We use recently assembled data from the Mine Safety and Health Administration (MSHA) concerning underground coal mine production, safety regulatory activities, and wo...
Mining Sequential Update Summarization with Hierarchical Text Analysis

Directory of Open Access Journals (Sweden)

Chunyun Zhang

2016-01-01

Full Text Available The outbreak of unexpected news events such as large human accident or natural disaster brings about a new information access problem where traditional approaches fail. Mostly, news of these events shows characteristics that are early sparse and later redundant. Hence, it is very important to get updates and provide individuals with timely and important information of these incidents during their development, especially when being applied in wireless and mobile Internet of Things (IoT. In this paper, we define the problem of sequential update summarization extraction and present a new hierarchical update mining system which can broadcast with useful, new, and timely sentence-length updates about a developing event. The new system proposes a novel method, which incorporates techniques from topic-level and sentence-level summarization. To evaluate the performance of the proposed system, we apply it to the task of sequential update summarization of temporal summarization (TS track at Text Retrieval Conference (TREC 2013 to compute four measurements of the update mining system: the expected gain, expected latency gain, comprehensiveness, and latency comprehensiveness. Experimental results show that our proposed method has good performance.
Text Mining Untuk Analisis Sentimen Review Film Menggunakan Algoritma K-Means

Directory of Open Access Journals (Sweden)

Setyo Budi

2017-02-01

Full Text Available Kemudahan manusia didalam menggunakan website mengakibatkan bertambahnya dokumen teks yang berupa pendapat dan informasi. Dalam waktu yang lama dokumen teks akan bertambah besar. Text mining merupakan salah satu teknik yang digunakan untuk menggali kumpulan dokumen text sehingga dapat diambil intisarinya. Ada beberapa algoritma yang di gunakan untuk penggalian dokumen untuk analisis sentimen, salah satunya adalah K-Means. Didalam penelitian ini algoritma yang digunakan adalah K-Means. Hasil penelitian menunjukkan bahwa akurasi K-Means dengan dataset digunakan 300 positif dan 300 negatif akurasinya 57.83%, 700 dokumen positif dan 700 negatif akurasinya 56.71%%, 1000 dokumen positif dan 1000 negatif akurasinya 50.40%%. Dari hasil pengujian disimpulkan bahwa semakin besar dataset yang digunakan semakin rendah akurasi K-Means. Kata Kunci : Text Mining, Analisis Sentimen, K-Means, Review Film
The Application of Text Mining in Business Research

DEFF Research Database (Denmark)

Preuss, Bjørn

2017-01-01

The aim of this paper is to present a methodological concept in business research that has the potential to become one of the most powerful methods in the upcoming years when it comes to research qualitative phenomena in business and society. It presents a selection of algorithms as well elaborat...... on potential use cases for a text mining based approach to qualitative data analysis....
Compatibility between Text Mining and Qualitative Research in the Perspectives of Grounded Theory, Content Analysis, and Reliability

Science.gov (United States)

Yu, Chong Ho; Jannasch-Pennell, Angel; DiGangi, Samuel

2011-01-01

The objective of this article is to illustrate that text mining and qualitative research are epistemologically compatible. First, like many qualitative research approaches, such as grounded theory, text mining encourages open-mindedness and discourages preconceptions. Contrary to the popular belief that text mining is a linear and fully automated…
MET network in PubMed: a text-mined network visualization and curation system.

Science.gov (United States)

Dai, Hong-Jie; Su, Chu-Hsien; Lai, Po-Ting; Huang, Ming-Siang; Jonnagaddala, Jitendra; Rose Jue, Toni; Rao, Shruti; Chou, Hui-Jou; Milacic, Marija; Singh, Onkar; Syed-Abdul, Shabbir; Hsu, Wen-Lian

2016-01-01

Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway. © The Author(s) 2016. Published by Oxford University Press.
Identifying child abuse through text mining and machine learning

NARCIS (Netherlands)

Amrit, Chintan; Paauw, Tim; Aly, Robin; Lavric, Miha

2017-01-01

In this paper, we describe how we used text mining and analysis to identify and predict cases of child abuse in a public health institution. Such institutions in the Netherlands try to identify and prevent different kinds of abuse. A significant part of the medical data that the institutions have on
Data Processing and Text Mining Technologies on Electronic Medical Records: A Review

Directory of Open Access Journals (Sweden)

Wencheng Sun

2018-01-01

Full Text Available Currently, medical institutes generally use EMR to record patient’s condition, including diagnostic information, procedures performed, and treatment results. EMR has been recognized as a valuable resource for large-scale analysis. However, EMR has the characteristics of diversity, incompleteness, redundancy, and privacy, which make it difficult to carry out data mining and analysis directly. Therefore, it is necessary to preprocess the source data in order to improve data quality and improve the data mining results. Different types of data require different processing technologies. Most structured data commonly needs classic preprocessing technologies, including data cleansing, data integration, data transformation, and data reduction. For semistructured or unstructured data, such as medical text, containing more health information, it requires more complex and challenging processing methods. The task of information extraction for medical texts mainly includes NER (named-entity recognition and RE (relation extraction. This paper focuses on the process of EMR processing and emphatically analyzes the key techniques. In addition, we make an in-depth study on the applications developed based on text mining together with the open challenges and research issues for future work.
The Canadian Nuclear Safety Commission regulatory process for decommissioning a uranium mining facility

International Nuclear Information System (INIS)

Scissons, K.; Schryer, D.M.; Goulden, W.; Natomagan, C.

2002-01-01

The Canadian Nuclear Safety Commission (CNSC) regulates uranium mining in Canada. The CNSC regulatory process requires that a licence applicant plan for and commit to future decommissioning before irrevocable decisions are made, and throughout the life of a uranium mine. These requirements include conceptual decommissioning plans and the provision of financial assurances to ensure the availability of funds for decommissioning activities. When an application for decommissioning is submitted to the CNSC, an environmental assessment is required prior to initiating the licensing process. A case study is presented for COGEMA Resources Inc. (COGEMA), who is entering the decommissioning phase with the CNSC for the Cluff Lake uranium mine. As part of the licensing process, CNSC multidisciplinary staff assesses the decommissioning plan, associated costs, and the environmental assessment. When the CNSC is satisfied that all of its requirements are met, a decommissioning licence may be issued. (author)
New challenges for text mining: mapping between text and manually curated pathways

Science.gov (United States)

Oda, Kanae; Kim, Jin-Dong; Ohta, Tomoko; Okanohara, Daisuke; Matsuzaki, Takuya; Tateisi, Yuka; Tsujii, Jun'ichi

2008-01-01

Background Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactions, and (3) the formulation and implementation of required inferences based on biological domain knowledge. Results To address these challenges, we constructed new resources to link the text with a model pathway; they are: the GENIA pathway corpus with event annotation and NF-kB pathway. Through their detailed analysis, we address the untapped resource, ‘bio-inference,’ as well as the differences between text and pathway representation. Here, we show the precise comparisons of their representations and the nine classes of ‘bio-inference’ schemes observed in the pathway corpus. Conclusions We believe that the creation of such rich resources and their detailed analysis is the significant first step for accelerating the research of the automatic construction of pathway from text. PMID:18426550
Spectral signature verification using statistical analysis and text mining

Science.gov (United States)

DeCoster, Mallory E.; Firpi, Alexe H.; Jacobs, Samantha K.; Cone, Shelli R.; Tzeng, Nigel H.; Rodriguez, Benjamin M.

2016-05-01

In the spectral science community, numerous spectral signatures are stored in databases representative of many sample materials collected from a variety of spectrometers and spectroscopists. Due to the variety and variability of the spectra that comprise many spectral databases, it is necessary to establish a metric for validating the quality of spectral signatures. This has been an area of great discussion and debate in the spectral science community. This paper discusses a method that independently validates two different aspects of a spectral signature to arrive at a final qualitative assessment; the textual meta-data and numerical spectral data. Results associated with the spectral data stored in the Signature Database1 (SigDB) are proposed. The numerical data comprising a sample material's spectrum is validated based on statistical properties derived from an ideal population set. The quality of the test spectrum is ranked based on a spectral angle mapper (SAM) comparison to the mean spectrum derived from the population set. Additionally, the contextual data of a test spectrum is qualitatively analyzed using lexical analysis text mining. This technique analyzes to understand the syntax of the meta-data to provide local learning patterns and trends within the spectral data, indicative of the test spectrum's quality. Text mining applications have successfully been implemented for security2 (text encryption/decryption), biomedical3 , and marketing4 applications. The text mining lexical analysis algorithm is trained on the meta-data patterns of a subset of high and low quality spectra, in order to have a model to apply to the entire SigDB data set. The statistical and textual methods combine to assess the quality of a test spectrum existing in a database without the need of an expert user. This method has been compared to other validation methods accepted by the spectral science community, and has provided promising results when a baseline spectral signature is
Evolution of bayesian-related research over time: a temporal text mining task

CSIR Research Space (South Africa)

de Waal, A

2006-06-01

Full Text Available Ronald Reagan’s Radio Addresses? Bayesian Analysis 2006, Volume 1, Number 2, pp. 189-383. 2. Mei Q and Zhai C, 2005. Discovering Evolutionary Theme Patterns from Text – An Exploration of Temporal Text Mining. KDD’05, August 21-24, 2005. Chicago...
Building a glaucoma interaction network using a text mining approach.

Science.gov (United States)

Soliman, Maha; Nasraoui, Olfa; Cooper, Nigel G F

2016-01-01

The volume of biomedical literature and its underlying knowledge base is rapidly expanding, making it beyond the ability of a single human being to read through all the literature. Several automated methods have been developed to help make sense of this dilemma. The present study reports on the results of a text mining approach to extract gene interactions from the data warehouse of published experimental results which are then used to benchmark an interaction network associated with glaucoma. To the best of our knowledge, there is, as yet, no glaucoma interaction network derived solely from text mining approaches. The presence of such a network could provide a useful summative knowledge base to complement other forms of clinical information related to this disease. A glaucoma corpus was constructed from PubMed Central and a text mining approach was applied to extract genes and their relations from this corpus. The extracted relations between genes were checked using reference interaction databases and classified generally as known or new relations. The extracted genes and relations were then used to construct a glaucoma interaction network. Analysis of the resulting network indicated that it bears the characteristics of a small world interaction network. Our analysis showed the presence of seven glaucoma linked genes that defined the network modularity. A web-based system for browsing and visualizing the extracted glaucoma related interaction networks is made available at http://neurogene.spd.louisville.edu/GlaucomaINViewer/Form1.aspx. This study has reported the first version of a glaucoma interaction network using a text mining approach. The power of such an approach is in its ability to cover a wide range of glaucoma related studies published over many years. Hence, a bigger picture of the disease can be established. To the best of our knowledge, this is the first glaucoma interaction network to summarize the known literature. The major findings were a set of
Signal Detection Framework Using Semantic Text Mining Techniques

Science.gov (United States)

Sudarsan, Sithu D.

2009-01-01

Signal detection is a challenging task for regulatory and intelligence agencies. Subject matter experts in those agencies analyze documents, generally containing narrative text in a time bound manner for signals by identification, evaluation and confirmation, leading to follow-up action e.g., recalling a defective product or public advisory for…
DDMGD: the database of text-mined associations between genes methylated in diseases from different species

KAUST Repository

Raies, A. B.; Mansour, H.; Incitti, R.; Bajic, Vladimir B.

2014-01-01

://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD's scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we
A guide to ventilation requirements for uranium mines and mills. Regulatory guide G-221

International Nuclear Information System (INIS)

2003-06-01

The purpose of G-221 is to help persons address the requirements for the submission of ventilation-related information when applying for a Canadian Nuclear Safety Commission (CNSC) licence to site and construct, operate or decommission a uranium mine or mill. This guide is also intended to help applicants for a uranium mine or mill licence understand their operational and maintenance obligations with respect to ventilation systems, and to help CNSC staff evaluate the adequacy of applications for uranium mine and mill licences. This guide is relevant to any application for a CNSC licence to prepare a site for and construct, operate or decommission a uranium mine or mill. In addition to summarizing the ventilation-related obligations or uranium mine and mill licensee, the guide describes and discusses the ventilation-related information that licence applicants should typically submit to meet regulatory requirements. The guide pertains to any ventilation of uranium mines and mills for the purpose of assuring the radiation safety of workers and on-site personnel. This ventilation may be associated with any underground or surface area or premise that is licensable by the CNSC as part of a uranium mine or mill. These areas and premises typically include mine workings, mill buildings, and other areas or premises involving or potentially affected by radiation or radioactive materials. Some examples of the latter include offices, effluent treatment plants, cafeterias, lunch rooms and personnel change-rooms. (author)
The Distribution of the Informative Intensity of the Text in Terms of its Structure (On Materials of the English Texts in the Mining Sphere

Directory of Open Access Journals (Sweden)

Znikina Ludmila

2017-01-01

Full Text Available The article deals with the distribution of informative intensity of the English-language scientific text based on its structural features contributing to the process of formalization of the scientific text and the preservation of the adequacy of the text with derived semantic information in relation to the primary. Discourse analysis is built on specific compositional and meaningful examples of scientific texts taken from the mining field. It also analyzes the adequacy of the translation of foreign texts into another language, the relationships between elements of linguistic systems, the degree of a formal conformance, translation with the specific objectives and information needs of the recipient. Some key words and ideas are emphasized in the paragraphs of the English-language mining scientific texts. The article gives the characteristic features of the structure of paragraphs of technical text and examples of constructions in English scientific texts based on a mining theme with the aim to explain the possible ways of their adequate translation.
The Distribution of the Informative Intensity of the Text in Terms of its Structure (On Materials of the English Texts in the Mining Sphere)

Science.gov (United States)

Znikina, Ludmila; Rozhneva, Elena

2017-11-01

The article deals with the distribution of informative intensity of the English-language scientific text based on its structural features contributing to the process of formalization of the scientific text and the preservation of the adequacy of the text with derived semantic information in relation to the primary. Discourse analysis is built on specific compositional and meaningful examples of scientific texts taken from the mining field. It also analyzes the adequacy of the translation of foreign texts into another language, the relationships between elements of linguistic systems, the degree of a formal conformance, translation with the specific objectives and information needs of the recipient. Some key words and ideas are emphasized in the paragraphs of the English-language mining scientific texts. The article gives the characteristic features of the structure of paragraphs of technical text and examples of constructions in English scientific texts based on a mining theme with the aim to explain the possible ways of their adequate translation.

Using Text Mining to Uncover Students' Technology-Related Problems in Live Video Streaming

Science.gov (United States)

Abdous, M'hammed; He, Wu

2011-01-01

Because of their capacity to sift through large amounts of data, text mining and data mining are enabling higher education institutions to reveal valuable patterns in students' learning behaviours without having to resort to traditional survey methods. In an effort to uncover live video streaming (LVS) students' technology related-problems and to…
Text mining approach to predict hospital admissions using early medical records from the emergency department.

Science.gov (United States)

Lucini, Filipe R; S Fogliatto, Flavio; C da Silveira, Giovani J; L Neyeloff, Jeruza; Anzanello, Michel J; de S Kuchenbecker, Ricardo; D Schaan, Beatriz

2017-04-01

Emergency department (ED) overcrowding is a serious issue for hospitals. Early information on short-term inward bed demand from patients receiving care at the ED may reduce the overcrowding problem, and optimize the use of hospital resources. In this study, we use text mining methods to process data from early ED patient records using the SOAP framework, and predict future hospitalizations and discharges. We try different approaches for pre-processing of text records and to predict hospitalization. Sets-of-words are obtained via binary representation, term frequency, and term frequency-inverse document frequency. Unigrams, bigrams and trigrams are tested for feature formation. Feature selection is based on χ 2 and F-score metrics. In the prediction module, eight text mining methods are tested: Decision Tree, Random Forest, Extremely Randomized Tree, AdaBoost, Logistic Regression, Multinomial Naïve Bayes, Support Vector Machine (Kernel linear) and Nu-Support Vector Machine (Kernel linear). Prediction performance is evaluated by F1-scores. Precision and Recall values are also informed for all text mining methods tested. Nu-Support Vector Machine was the text mining method with the best overall performance. Its average F1-score in predicting hospitalization was 77.70%, with a standard deviation (SD) of 0.66%. The method could be used to manage daily routines in EDs such as capacity planning and resource allocation. Text mining could provide valuable information and facilitate decision-making by inward bed management teams. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Technical Status Report of the Regulatory Assistance Project: October 2001-February 2003

Energy Technology Data Exchange (ETDEWEB)

2003-08-01

This report details the work undertaken from October 2001 to February 2003 by the Regulatory Assistance Project under subcontract to the National Renewable Energy Laboratory. The objectives of this work were to develop regulatory policy options that would reduce the institutional and infrastructure barriers to full-value deployment of distributed power systems. Specific tasks included leading technical workshops on removing or overcoming regulatory barriers to distributed resources for state utility regulators and developing a draft model rule on emission performance standards for distributed generation.
An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature.

Science.gov (United States)

Trybula, Walter J.; Wyllys, Ronald E.

2000-01-01

Addresses an approach to the discovery of scientific knowledge through an examination of data mining and text mining techniques. Presents the results of experiments that investigated knowledge acquisition from a selected set of technical documents by domain experts. (Contains 15 references.) (Author/LRW)
CrossRef text and data mining services

Directory of Open Access Journals (Sweden)

Rachael Lammey

2015-02-01

Full Text Available CrossRef is an association of scholarly publishers that develops shared infrastructure to support more effective scholarly communications. It is a registration agency for the digital object identifier (DOI, and has built additional services for CrossRef members around the DOI and the bibliographic metadata that publishers deposit in order to register DOIs for their publications. Among these services are CrossCheck, powered by iThenticate, which helps publishers screen for plagiarism in submitted manuscripts and FundRef, which gives publishers standard way to report funding sources for published scholarly research. To add to these services, Cross-Ref launched CrossRef text and data mining services in May 2014. This article will explain the thinking behind CrossRef launching this new service, what it offers to publishers and researchers alike, how publishers can participate in it, and the uptake of the service so far.
Mining consumer health vocabulary from community-generated text.

Science.gov (United States)

Vydiswaran, V G Vinod; Mei, Qiaozhu; Hanauer, David A; Zheng, Kai

2014-01-01

Community-generated text corpora can be a valuable resource to extract consumer health vocabulary (CHV) and link them to professional terminologies and alternative variants. In this research, we propose a pattern-based text-mining approach to identify pairs of CHV and professional terms from Wikipedia, a large text corpus created and maintained by the community. A novel measure, leveraging the ratio of frequency of occurrence, was used to differentiate consumer terms from professional terms. We empirically evaluated the applicability of this approach using a large data sample consisting of MedLine abstracts and all posts from an online health forum, MedHelp. The results show that the proposed approach is able to identify synonymous pairs and label the terms as either consumer or professional term with high accuracy. We conclude that the proposed approach provides great potential to produce a high quality CHV to improve the performance of computational applications in processing consumer-generated health text.
The intrinsic roles of regulatory instruments in mining operations

Directory of Open Access Journals (Sweden)

Kola O. Odeku

2015-05-01

Full Text Available Prospecting and exploiting natural mineral resources for economic growth and development could be beneficial if done in sustainable ways and manners. However, if the operation is done in such a way that cause harm to the environment and people, this will amount to unsustainable mining activity and anti-sustainable development. Therefore, there is need to ensure that appropriate and adequate plans and programmes are put in place in order to mitigate, minimise and avoid negative environmental impacts. Against the backdrop of these concerns and the need to ensure that the environment is not degraded and destroyed, South Africa, as part of the countries that promotes sustainable prospecting and mining has put in place and currently implementing tools known as environmental management plan and programme to regulate and control all prospecting and mining activities. These tools contain a bundle of remedial actions in the forms of compensation, rehabilitation and restoration of any harm done to the environment during the course of mining activities. They also contain information on mitigation, ingredients for good practice approach on how to conduct sustainable prospecting and mining. This article looks at the intrinsic roles of these tools and accentuates the importance and operations of their use in the decision making processe
OSCAR4: a flexible architecture for chemical text-mining

Directory of Open Access Journals (Sweden)

Jessop David M

2011-10-01

Full Text Available Abstract The Open-Source Chemistry Analysis Routines (OSCAR software, a toolkit for the recognition of named entities and data in chemistry publications, has been developed since 2002. Recent work has resulted in the separation of the core OSCAR functionality and its release as the OSCAR4 library. This library features a modular API (based on reduction of surface coupling that permits client programmers to easily incorporate it into external applications. OSCAR4 offers a domain-independent architecture upon which chemistry specific text-mining tools can be built, and its development and usage are discussed.
Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications.

Science.gov (United States)

Vazquez, Miguel; Krallinger, Martin; Leitner, Florian; Valencia, Alfonso

2011-06-01

Providing prior knowledge about biological properties of chemicals, such as kinetic values, protein targets, or toxic effects, can facilitate many aspects of drug development. Chemical information is rapidly accumulating in all sorts of free text documents like patents, industry reports, or scientific articles, which has motivated the development of specifically tailored text mining applications. Despite the potential gains, chemical text mining still faces significant challenges. One of the most salient is the recognition of chemical entities mentioned in text. To help practitioners contribute to this area, a good portion of this review is devoted to this issue, and presents the basic concepts and principles underlying the main strategies. The technical details are introduced and accompanied by relevant bibliographic references. Other tasks discussed are retrieving relevant articles, identifying relationships between chemicals and other entities, or determining the chemical structures of chemicals mentioned in text. This review also introduces a number of published applications that can be used to build pipelines in topics like drug side effects, toxicity, and protein-disease-compound network analysis. We conclude the review with an outlook on how we expect the field to evolve, discussing its possibilities and its current limitations. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Mining free-text medical records for companion animal enteric syndrome surveillance.

Science.gov (United States)

Anholt, R M; Berezowski, J; Jamal, I; Ribble, C; Stephen, C

2014-03-01

Large amounts of animal health care data are present in veterinary electronic medical records (EMR) and they present an opportunity for companion animal disease surveillance. Veterinary patient records are largely in free-text without clinical coding or fixed vocabulary. Text-mining, a computer and information technology application, is needed to identify cases of interest and to add structure to the otherwise unstructured data. In this study EMR's were extracted from veterinary management programs of 12 participating veterinary practices and stored in a data warehouse. Using commercially available text-mining software (WordStat™), we developed a categorization dictionary that could be used to automatically classify and extract enteric syndrome cases from the warehoused electronic medical records. The diagnostic accuracy of the text-miner for retrieving cases of enteric syndrome was measured against human reviewers who independently categorized a random sample of 2500 cases as enteric syndrome positive or negative. Compared to the reviewers, the text-miner retrieved cases with enteric signs with a sensitivity of 87.6% (95%CI, 80.4-92.9%) and a specificity of 99.3% (95%CI, 98.9-99.6%). Automatic and accurate detection of enteric syndrome cases provides an opportunity for community surveillance of enteric pathogens in companion animals. Copyright © 2014 Elsevier B.V. All rights reserved.
Data Mining Mining Data: MSHA Enforcement Efforts, Underground Coal Mine Safety, and New Health Policy Implications

OpenAIRE

Thomas J. Kniesner; John D. Leeth

2003-01-01

Studies of industrial safety regulations, Occupational Safety and Health Administration (OSHA) in particular, often find little effect on worker safety. Critics of the regulatory approach argue that safety standards have little to do with industrial injuries and defenders of the regulatory approach cite infrequent inspections and low fines for violating safety standards. We use recently assembled data from the Mine Safety and Health Administration (MSHA) concerning underground coal mine produ...
Canadian uranium mines and mills evolution of regulatory expectations and requirements for effluent treatment

International Nuclear Information System (INIS)

LeClair, J.; Ashley, F.

2006-01-01

The regulation of uranium mining in Canada has changed over time as our understanding and concern for impacts on both human and non-human biota has evolved. Since the mid-1970s and early 1980s, new uranium mine and mill developments have been the subject of environmental assessments to assess and determine the significance of environmental effects throughout the project life cycle including the post-decommissioning phase. Water treatment systems have subsequently been improved to limit potential effects by reducing the concentration of radiological and non-radiological contaminants in the effluent discharge and the total loadings to the environment. This paper examines current regulatory requirements and expectations and how these impact uranium mining/milling practices. It also reviews current water management and effluent treatment practices and performance. Finally, it examines the issues and challenges for existing effluent treatment systems and identifies factors to be considered in optimizing current facilities and future facility designs. (author)
U-Compare: share and compare text mining tools with UIMA

Science.gov (United States)

Kano, Yoshinobu; Baumgartner, William A.; McCrohon, Luke; Ananiadou, Sophia; Cohen, K. Bretonnel; Hunter, Lawrence; Tsujii, Jun'ichi

2009-01-01

Summary: Due to the increasing number of text mining resources (tools and corpora) available to biologists, interoperability issues between these resources are becoming significant obstacles to using them effectively. UIMA, the Unstructured Information Management Architecture, is an open framework designed to aid in the construction of more interoperable tools. U-Compare is built on top of the UIMA framework, and provides both a concrete framework for out-of-the-box text mining and a sophisticated evaluation platform allowing users to run specific tools on any target text, generating both detailed statistics and instance-based visualizations of outputs. U-Compare is a joint project, providing the world's largest, and still growing, collection of UIMA-compatible resources. These resources, originally developed by different groups for a variety of domains, include many famous tools and corpora. U-Compare can be launched straight from the web, without needing to be manually installed. All U-Compare components are provided ready-to-use and can be combined easily via a drag-and-drop interface without any programming. External UIMA components can also simply be mixed with U-Compare components, without distinguishing between locally and remotely deployed resources. Availability: http://u-compare.org/ Contact: kano@is.s.u-tokyo.ac.jp PMID:19414535
Text Mining of UU-ITE Implementation in Indonesia

Science.gov (United States)

Hakim, Lukmanul; Kusumasari, Tien F.; Lubis, Muharman

2018-04-01

At present, social media and networks act as one of the main platforms for sharing information, idea, thought and opinions. Many people share their knowledge and express their views on the specific topics or current hot issues that interest them. The social media texts have rich information about the complaints, comments, recommendation and suggestion as the automatic reaction or respond to government initiative or policy in order to overcome certain issues.This study examines the sentiment from netizensas part of citizen who has vocal sound about the implementation of UU ITE as the first cyberlaw in Indonesia as a means to identify the current tendency of citizen perception. To perform text mining techniques, this study used Twitter Rest API while R programming was utilized for the purpose of classification analysis based on hierarchical cluster.
Evaluating a Bilingual Text-Mining System with a Taxonomy of Key Words and Hierarchical Visualization for Understanding Learner-Generated Text

Science.gov (United States)

Kong, Siu Cheung; Li, Ping; Song, Yanjie

2018-01-01

This study evaluated a bilingual text-mining system, which incorporated a bilingual taxonomy of key words and provided hierarchical visualization, for understanding learner-generated text in the learning management systems through automatic identification and counting of matching key words. A class of 27 in-service teachers studied a course…
Experiences with Text Mining Large Collections of Unstructured Systems Development Artifacts at JPL

Science.gov (United States)

Port, Dan; Nikora, Allen; Hihn, Jairus; Huang, LiGuo

2011-01-01

Often repositories of systems engineering artifacts at NASA's Jet Propulsion Laboratory (JPL) are so large and poorly structured that they have outgrown our capability to effectively manually process their contents to extract useful information. Sophisticated text mining methods and tools seem a quick, low-effort approach to automating our limited manual efforts. Our experiences of exploring such methods mainly in three areas including historical risk analysis, defect identification based on requirements analysis, and over-time analysis of system anomalies at JPL, have shown that obtaining useful results requires substantial unanticipated efforts - from preprocessing the data to transforming the output for practical applications. We have not observed any quick 'wins' or realized benefit from short-term effort avoidance through automation in this area. Surprisingly we have realized a number of unexpected long-term benefits from the process of applying text mining to our repositories. This paper elaborates some of these benefits and our important lessons learned from the process of preparing and applying text mining to large unstructured system artifacts at JPL aiming to benefit future TM applications in similar problem domains and also in hope for being extended to broader areas of applications.
76 FR 64043 - Iowa Regulatory Program

Science.gov (United States)

2011-10-17

...) Requirements for permits for special categories of mining. 27--40.41(207) Permanent regulatory program--small... DEPARTMENT OF THE INTERIOR Office of Surface Mining Reclamation and Enforcement 30 CFR Part 915 [Docket No. IA-016-FOR; Docket ID: OSM-2011-0014] Iowa Regulatory Program AGENCY: Office of Surface Mining...
Text Mining the History of Medicine.

Science.gov (United States)

Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

2016-01-01

Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while
Text Mining Untuk Analisis Sentimen Review Film Menggunakan Algoritma K-Means

OpenAIRE

Setyo Budi

2017-01-01

Kemudahan manusia didalam menggunakan website mengakibatkan bertambahnya dokumen teks yang berupa pendapat dan informasi. Dalam waktu yang lama dokumen teks akan bertambah besar. Text mining merupakan salah satu teknik yang digunakan untuk menggali kumpulan dokumen text sehingga dapat diambil intisarinya. Ada beberapa algoritma yang di gunakan untuk penggalian dokumen untuk analisis sentimen, salah satunya adalah K-Means. Didalam penelitian ini algoritma yang digunakan adalah K-Means. Hasil p...
Regulatory issues associated with exclusion, exemption, and clearance related to the mining and minerals processing industries

International Nuclear Information System (INIS)

Metcalf, P.; Woude, S. van der; Keenan, N.; Guy, S.

1997-01-01

The concepts of exclusion, exemption and clearance have been established in international recommendations and, standards for radiation protection and the management of radioactive waste in recent years. The consistent application of these concepts has given rise to various problems in different spheres of use. This is particularly the case in the mining and minerals processing industries dealing with materials exhibiting elevated concentrations of naturally occurring radionuclides. This paper takes the South African mining industry as an example and highlights some of the issues that have arisen in applying these concepts within a regulatory control regime. (author)

Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery.

Science.gov (United States)

Gonzalez, Graciela H; Tahsin, Tasnia; Goodale, Britton C; Greene, Anna C; Greene, Casey S

2016-01-01

Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine. © The Author 2015. Published by Oxford University Press.
Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows.

Science.gov (United States)

Fu, Xiao; Batista-Navarro, Riza; Rak, Rafal; Ananiadou, Sophia

2015-01-01

Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.
Using text mining for study identification in systematic reviews: a systematic review of current approaches.

Science.gov (United States)

O'Mara-Eves, Alison; Thomas, James; McNaught, John; Miwa, Makoto; Ananiadou, Sophia

2015-01-14

The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic review fills that research gap. Focusing mainly on non-technical issues, the review aims to increase awareness of the potential of these technologies and promote further collaborative research between the computer science and systematic review communities. Five research questions led our review: what is the state of the evidence base; how has workload reduction been evaluated; what are the purposes of semi-automation and how effective are they; how have key contextual problems of applying text mining to the systematic review field been addressed; and what challenges to implementation have emerged? We answered these questions using standard systematic review methods: systematic and exhaustive searching, quality-assured data extraction and a narrative synthesis to synthesise findings. The evidence base is active and diverse; there is almost no replication between studies or collaboration between research teams and, whilst it is difficult to establish any overall conclusions about best approaches, it is clear that efficiencies and reductions in workload are potentially achievable. On the whole, most suggested that a saving in workload of between 30% and 70% might be possible, though sometimes the saving in workload is accompanied by the loss of 5% of relevant studies (i.e. a 95% recall). Using text mining to prioritise the order in which items are screened should be considered safe and ready for use in 'live' reviews. The use of text mining as a 'second screener' may also be used cautiously
A Framework for Text Mining in Scientometric Study: A Case Study in Biomedicine Publications

Science.gov (United States)

Silalahi, V. M. M.; Hardiyati, R.; Nadhiroh, I. M.; Handayani, T.; Rahmaida, R.; Amelia, M.

2018-04-01

The data of Indonesians research publications in the domain of biomedicine has been collected to be text mined for the purpose of a scientometric study. The goal is to build a predictive model that provides a classification of research publications on the potency for downstreaming. The model is based on the drug development processes adapted from the literatures. An effort is described to build the conceptual model and the development of a corpus on the research publications in the domain of Indonesian biomedicine. Then an investigation is conducted relating to the problems associated with building a corpus and validating the model. Based on our experience, a framework is proposed to manage the scientometric study based on text mining. Our method shows the effectiveness of conducting a scientometric study based on text mining in order to get a valid classification model. This valid model is mainly supported by the iterative and close interactions with the domain experts starting from identifying the issues, building a conceptual model, to the labelling, validation and results interpretation.
tmBioC: improving interoperability of text-mining tools with BioC.

Science.gov (United States)

Khare, Ritu; Wei, Chih-Hsuan; Mao, Yuqing; Leaman, Robert; Lu, Zhiyong

2014-01-01

The lack of interoperability among biomedical text-mining tools is a major bottleneck in creating more complex applications. Despite the availability of numerous methods and techniques for various text-mining tasks, combining different tools requires substantial efforts and time owing to heterogeneity and variety in data formats. In response, BioC is a recent proposal that offers a minimalistic approach to tool interoperability by stipulating minimal changes to existing tools and applications. BioC is a family of XML formats that define how to present text documents and annotations, and also provides easy-to-use functions to read/write documents in the BioC format. In this study, we introduce our text-mining toolkit, which is designed to perform several challenging and significant tasks in the biomedical domain, and repackage the toolkit into BioC to enhance its interoperability. Our toolkit consists of six state-of-the-art tools for named-entity recognition, normalization and annotation (PubTator) of genes (GenNorm), diseases (DNorm), mutations (tmVar), species (SR4GN) and chemicals (tmChem). Although developed within the same group, each tool is designed to process input articles and output annotations in a different format. We modify these tools and enable them to read/write data in the proposed BioC format. We find that, using the BioC family of formats and functions, only minimal changes were required to build the newer versions of the tools. The resulting BioC wrapped toolkit, which we have named tmBioC, consists of our tools in BioC, an annotated full-text corpus in BioC, and a format detection and conversion tool. Furthermore, through participation in the 2013 BioCreative IV Interoperability Track, we empirically demonstrate that the tools in tmBioC can be more efficiently integrated with each other as well as with external tools: Our experimental results show that using BioC reduces >60% in lines of code for text-mining tool integration. The tmBioC toolkit
Using text mining for study identification in systematic reviews: a systematic review of current approaches

OpenAIRE

O?Mara-Eves, Alison; Thomas, James; McNaught, John; Miwa, Makoto; Ananiadou, Sophia

2015-01-01

Background The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic...
77 FR 71445 - Regulatory and Administrative Waivers Granted for Multifamily Housing Programs To Assist With...

Science.gov (United States)

2012-11-30

... DEPARTMENT OF HOUSING AND URBAN DEVELOPMENT [Docket No. 5677-N-01] Regulatory and Administrative Waivers Granted for Multifamily Housing Programs To Assist With Recovery and Relief in Sandy Disaster... in the disaster areas is widespread, and the need for regulatory relief in many areas pertaining to...
Complementing the Numbers: A Text Mining Analysis of College Course Withdrawals

Science.gov (United States)

Michalski, Greg V.

2011-01-01

Excessive college course withdrawals are costly to the student and the institution in terms of time to degree completion, available classroom space, and other resources. Although generally well quantified, detailed analysis of the reasons given by students for course withdrawal is less common. To address this, a text mining analysis was performed…
A Formal Framework on the Semantics of Regulatory Relations and Their Presence as Verbs in Biomedical Texts

DEFF Research Database (Denmark)

Zambach, Sine

2009-01-01

Relations used in biomedical ontologies and expressed in biomedical texts can be very general or very specific. Regulatory relations are used widely in regulatory networks, for example, and therefore they appear systematically and highly frequently in biomedical texts. This work focuses on the lo......Relations used in biomedical ontologies and expressed in biomedical texts can be very general or very specific. Regulatory relations are used widely in regulatory networks, for example, and therefore they appear systematically and highly frequently in biomedical texts. This work focuses...
The regulatory role of the Hungarian Geological Survey in the closure of Mecsek uranium mine

International Nuclear Information System (INIS)

Hamor, T.; Gombor, L.

2001-01-01

Under Mining Act XLIII established in 1993, the Hungarian Geological Survey was given a wide range of authority related to the environment, mining, nuclear and general constructions. In implementing these task the Survey will be supported by the well established Geological Institute of Hungary and the Eoetvoes Lorand Geophysical Institute. The Survey's role in the nuclear field includes the licensing of plans and reports on geologically related research to any nuclear facilities. The Hungarian Geological Survey is also co-authority on matters related to the establishment, construction, modification and closure, environmental protection of nuclear facilities in general and all matter related to uranium mining. The Survey's regulatory activity in radioactive waste management follows the Decree of the Minister of Industry and Tourism 62/1997 which is based on the Atomic Energy Act CXVI of 1966. These regulations were prepared in harmony with the OECD Nuclear Energy Agency and the International Atomic Energy Agency conventions, standards and guides and those of other countries. Case histories on the applications of these regulations to the closure of Mecsek uranium mine and the operation of the research laboratory tunnel for long-lived, high level radioactive waste are presented here. (author)
TIME SERIES ANALYSIS ON STOCK MARKET FOR TEXT MINING CORRELATION OF ECONOMY NEWS

Directory of Open Access Journals (Sweden)

Sadi Evren SEKER

2014-01-01

Full Text Available This paper proposes an information retrieval methodfor the economy news. Theeffect of economy news, are researched in the wordlevel and stock market valuesare considered as the ground proof.The correlation between stock market prices and economy news is an already ad-dressed problem for most of the countries. The mostwell-known approach is ap-plying the text mining approaches to the news and some time series analysis tech-niques over stock market closing values in order toapply classification or cluster-ing algorithms over the features extracted. This study goes further and tries to askthe question what are the available time series analysis techniques for the stockmarket closing values and which one is the most suitable? In this study, the newsand their dates are collected into a database and text mining is applied over thenews, the text mining part has been kept simple with only term frequency – in-verse document frequency method. For the time series analysis part, we havestudied 10 different methods such as random walk, moving average, acceleration,Bollinger band, price rate of change, periodic average, difference, momentum orrelative strength index and their variation. In this study we have also explainedthese techniques in a comparative way and we have applied the methods overTurkish Stock Market closing values for more than a2 year period. On the otherhand, we have applied the term frequency – inversedocument frequency methodon the economy news of one of the high-circulatingnewspapers in Turkey.
Sentiment analysis of Arabic tweets using text mining techniques

Science.gov (United States)

Al-Horaibi, Lamia; Khan, Muhammad Badruddin

2016-07-01

Sentiment analysis has become a flourishing field of text mining and natural language processing. Sentiment analysis aims to determine whether the text is written to express positive, negative, or neutral emotions about a certain domain. Most sentiment analysis researchers focus on English texts, with very limited resources available for other complex languages, such as Arabic. In this study, the target was to develop an initial model that performs satisfactorily and measures Arabic Twitter sentiment by using machine learning approach, Naïve Bayes and Decision Tree for classification algorithms. The datasets used contains more than 2,000 Arabic tweets collected from Twitter. We performed several experiments to check the performance of the two algorithms classifiers using different combinations of text-processing functions. We found that available facilities for Arabic text processing need to be made from scratch or improved to develop accurate classifiers. The small functionalities developed by us in a Python language environment helped improve the results and proved that sentiment analysis in the Arabic domain needs lot of work on the lexicon side.
Cluo: Web-Scale Text Mining System For Open Source Intelligence Purposes

Directory of Open Access Journals (Sweden)

Przemyslaw Maciolek

2013-01-01

Full Text Available The amount of textual information published on the Internet is considered tobe in billions of web pages, blog posts, comments, social media updates andothers. Analyzing such quantities of data requires high level of distribution –both data and computing. This is especially true in case of complex algorithms,often used in text mining tasks.The paper presents a prototype implementation of CLUO – an Open SourceIntelligence (OSINT system, which extracts and analyzes signiﬁcant quantitiesof openly available information.
Text Mining Metal-Organic Framework Papers.

Science.gov (United States)

Park, Sanghoon; Kim, Baekjun; Choi, Sihoon; Boyd, Peter G; Smit, Berend; Kim, Jihan

2018-02-26

We have developed a simple text mining algorithm that allows us to identify surface area and pore volumes of metal-organic frameworks (MOFs) using manuscript html files as inputs. The algorithm searches for common units (e.g., m 2 /g, cm 3 /g) associated with these two quantities to facilitate the search. From the sample set data of over 200 MOFs, the algorithm managed to identify 90% and 88.8% of the correct surface area and pore volume values. Further application to a test set of randomly chosen MOF html files yielded 73.2% and 85.1% accuracies for the two respective quantities. Most of the errors stem from unorthodox sentence structures that made it difficult to identify the correct data as well as bolded notations of MOFs (e.g., 1a) that made it difficult identify its real name. These types of tools will become useful when it comes to discovering structure-property relationships among MOFs as well as collecting a large set of data for references.
Application of Ferulic Acid for Alzheimer's Disease: Combination of Text Mining and Experimental Validation.

Science.gov (United States)

Meng, Guilin; Meng, Xiulin; Ma, Xiaoye; Zhang, Gengping; Hu, Xiaolin; Jin, Aiping; Zhao, Yanxin; Liu, Xueyuan

2018-01-01

Alzheimer's disease (AD) is an increasing concern in human health. Despite significant research, highly effective drugs to treat AD are lacking. The present study describes the text mining process to identify drug candidates from a traditional Chinese medicine (TCM) database, along with associated protein target mechanisms. We carried out text mining to identify literatures that referenced both AD and TCM and focused on identifying compounds and protein targets of interest. After targeting one potential TCM candidate, corresponding protein-protein interaction (PPI) networks were assembled in STRING to decipher the most possible mechanism of action. This was followed by validation using Western blot and co-immunoprecipitation in an AD cell model. The text mining strategy using a vast amount of AD-related literature and the TCM database identified curcumin, whose major component was ferulic acid (FA). This was used as a key candidate compound for further study. Using the top calculated interaction score in STRING, BACE1 and MMP2 were implicated in the activity of FA in AD. Exposure of SHSY5Y-APP cells to FA resulted in the decrease in expression levels of BACE-1 and APP, while the expression of MMP-2 and MMP-9 increased in a dose-dependent manner. This suggests that FA induced BACE1 and MMP2 pathways maybe novel potential mechanisms involved in AD. The text mining of literature and TCM database related to AD suggested FA as a promising TCM ingredient for the treatment of AD. Potential mechanisms interconnected and integrated with Aβ aggregation inhibition and extracellular matrix remodeling underlying the activity of FA were identified using in vitro studies.
Text Mining for Precision Medicine: Bringing structure to EHRs and biomedical literature to understand genes and health

Science.gov (United States)

Simmons, Michael; Singhal, Ayush; Lu, Zhiyong

2018-01-01

The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text — found in biomedical publications and clinical notes — is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine. PMID:27807747
Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health.

Science.gov (United States)

Simmons, Michael; Singhal, Ayush; Lu, Zhiyong

2016-01-01

The key question of precision medicine is whether it is possible to find clinically actionable granularity in diagnosing disease and classifying patient risk. The advent of next-generation sequencing and the widespread adoption of electronic health records (EHRs) have provided clinicians and researchers a wealth of data and made possible the precise characterization of individual patient genotypes and phenotypes. Unstructured text-found in biomedical publications and clinical notes-is an important component of genotype and phenotype knowledge. Publications in the biomedical literature provide essential information for interpreting genetic data. Likewise, clinical notes contain the richest source of phenotype information in EHRs. Text mining can render these texts computationally accessible and support information extraction and hypothesis generation. This chapter reviews the mechanics of text mining in precision medicine and discusses several specific use cases, including database curation for personalized cancer medicine, patient outcome prediction from EHR-derived cohorts, and pharmacogenomic research. Taken as a whole, these use cases demonstrate how text mining enables effective utilization of existing knowledge sources and thus promotes increased value for patients and healthcare systems. Text mining is an indispensable tool for translating genotype-phenotype data into effective clinical care that will undoubtedly play an important role in the eventual realization of precision medicine.
Coronary artery disease risk assessment from unstructured electronic health records using text mining.

Science.gov (United States)

Jonnagaddala, Jitendra; Liaw, Siaw-Teng; Ray, Pradeep; Kumar, Manish; Chang, Nai-Wen; Dai, Hong-Jie

2015-12-01

Coronary artery disease (CAD) often leads to myocardial infarction, which may be fatal. Risk factors can be used to predict CAD, which may subsequently lead to prevention or early intervention. Patient data such as co-morbidities, medication history, social history and family history are required to determine the risk factors for a disease. However, risk factor data are usually embedded in unstructured clinical narratives if the data is not collected specifically for risk assessment purposes. Clinical text mining can be used to extract data related to risk factors from unstructured clinical notes. This study presents methods to extract Framingham risk factors from unstructured electronic health records using clinical text mining and to calculate 10-year coronary artery disease risk scores in a cohort of diabetic patients. We developed a rule-based system to extract risk factors: age, gender, total cholesterol, HDL-C, blood pressure, diabetes history and smoking history. The results showed that the output from the text mining system was reliable, but there was a significant amount of missing data to calculate the Framingham risk score. A systematic approach for understanding missing data was followed by implementation of imputation strategies. An analysis of the 10-year Framingham risk scores for coronary artery disease in this cohort has shown that the majority of the diabetic patients are at moderate risk of CAD. Copyright © 2015 Elsevier Inc. All rights reserved.
Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease.

Science.gov (United States)

Small, Aeron M; Kiss, Daniel H; Zlatsin, Yevgeny; Birtwell, David L; Williams, Heather; Guerraty, Marie A; Han, Yuchi; Anwaruddin, Saif; Holmes, John H; Chirinos, Julio A; Wilensky, Robert L; Giri, Jay; Rader, Daniel J

2017-08-01

Interrogation of the electronic health record (EHR) using billing codes as a surrogate for diagnoses of interest has been widely used for clinical research. However, the accuracy of this methodology is variable, as it reflects billing codes rather than severity of disease, and depends on the disease and the accuracy of the coding practitioner. Systematic application of text mining to the EHR has had variable success for the detection of cardiovascular phenotypes. We hypothesize that the application of text mining algorithms to cardiovascular procedure reports may be a superior method to identify patients with cardiovascular conditions of interest. We adapted the Oracle product Endeca, which utilizes text mining to identify terms of interest from a NoSQL-like database, for purposes of searching cardiovascular procedure reports and termed the tool "PennSeek". We imported 282,569 echocardiography reports representing 81,164 individuals and 27,205 cardiac catheterization reports representing 14,567 individuals from non-searchable databases into PennSeek. We then applied clinical criteria to these reports in PennSeek to identify patients with trileaflet aortic stenosis (TAS) and coronary artery disease (CAD). Accuracy of patient identification by text mining through PennSeek was compared with ICD-9 billing codes. Text mining identified 7115 patients with TAS and 9247 patients with CAD. ICD-9 codes identified 8272 patients with TAS and 6913 patients with CAD. 4346 patients with AS and 6024 patients with CAD were identified by both approaches. A randomly selected sample of 200-250 patients uniquely identified by text mining was compared with 200-250 patients uniquely identified by billing codes for both diseases. We demonstrate that text mining was superior, with a positive predictive value (PPV) of 0.95 compared to 0.53 by ICD-9 for TAS, and a PPV of 0.97 compared to 0.86 for CAD. These results highlight the superiority of text mining algorithms applied to electronic
Regulatory preparations towards commencement of uranium mining and processing of radioactive ores in Tanzania

International Nuclear Information System (INIS)

Gurisha, M.; Kim, C-L.

2014-01-01

The regulatory preparatory work undertaken by the government of the United Republic of Tanzania through the Tanzania Atomic Energy Commission (TAEC) following the Mkuyu River Uranium Project definitive feasibility study is discussed. The project, which has been taken over by ARMZ Uranium One, acquired a construction permit in April 2013, where by 345 km"2 of land inside the 50,000 km"2 world heritage Selous Game Reserve was allocated for the purpose. The project has been realized through the government effort to strengthen the regulatory framework via the revised Atomic Energy Act No.7 of 2003, preparations of Radiation Safety in Mining and Radioactive Ores Regulations of 2011, and the human resource capacity development in areas related to inspection and licensing. Sample collection in Bahi and Manyoni areas in the central part of the country to investigate uranium uptake from the plants and radioactivity from water and plant samples is ongoing. The regulatory preparatory work will provide an opportunity to the public to comprehend the measures undertaken by TAEC to protect human health and the environment. (author)

Can abstract screening workload be reduced using text mining? User experiences of the tool Rayyan.

Science.gov (United States)

Olofsson, Hanna; Brolund, Agneta; Hellberg, Christel; Silverstein, Rebecca; Stenström, Karin; Österberg, Marie; Dagerhamn, Jessica

2017-09-01

One time-consuming aspect of conducting systematic reviews is the task of sifting through abstracts to identify relevant studies. One promising approach for reducing this burden uses text mining technology to identify those abstracts that are potentially most relevant for a project, allowing those abstracts to be screened first. To examine the effectiveness of the text mining functionality of the abstract screening tool Rayyan. User experiences were collected. Rayyan was used to screen abstracts for 6 reviews in 2015. After screening 25%, 50%, and 75% of the abstracts, the screeners logged the relevant references identified. A survey was sent to users. After screening half of the search result with Rayyan, 86% to 99% of the references deemed relevant to the study were identified. Of those studies included in the final reports, 96% to 100% were already identified in the first half of the screening process. Users rated Rayyan 4.5 out of 5. The text mining function in Rayyan successfully helped reviewers identify relevant studies early in the screening process. Copyright © 2017 John Wiley & Sons, Ltd.
The Feasibility of Using Large-Scale Text Mining to Detect Adverse Childhood Experiences in a VA-Treated Population.

Science.gov (United States)

Hammond, Kenric W; Ben-Ari, Alon Y; Laundry, Ryan J; Boyko, Edward J; Samore, Matthew H

2015-12-01

Free text in electronic health records resists large-scale analysis. Text records facts of interest not found in encoded data, and text mining enables their retrieval and quantification. The U.S. Department of Veterans Affairs (VA) clinical data repository affords an opportunity to apply text-mining methodology to study clinical questions in large populations. To assess the feasibility of text mining, investigation of the relationship between exposure to adverse childhood experiences (ACEs) and recorded diagnoses was conducted among all VA-treated Gulf war veterans, utilizing all progress notes recorded from 2000-2011. Text processing extracted ACE exposures recorded among 44.7 million clinical notes belonging to 243,973 veterans. The relationship of ACE exposure to adult illnesses was analyzed using logistic regression. Bias considerations were assessed. ACE score was strongly associated with suicide attempts and serious mental disorders (ORs = 1.84 to 1.97), and less so with behaviorally mediated and somatic conditions (ORs = 1.02 to 1.36) per unit. Bias adjustments did not remove persistent associations between ACE score and most illnesses. Text mining to detect ACE exposure in a large population was feasible. Analysis of the relationship between ACE score and adult health conditions yielded patterns of association consistent with prior research. Copyright © 2015 International Society for Traumatic Stress Studies.
Beating Obesity: Factors Associated with Interest in Workplace Weight Management Assistance in the Mining Industry

Directory of Open Access Journals (Sweden)

Tamara D. Street

2017-03-01

Conclusion: Weight management programs should provide information, motivation. and trouble-shooting assistance to meet the needs of at-risk mining employees, including those who are attempting to change and maintain behaviors to achieve a healthy weight and be suitably fit for work.
Towards A Model Of Knowledge Extraction Of Text Mining For Palliative Care Patients In Panama.

Directory of Open Access Journals (Sweden)

Denis Cedeno Moreno

2015-08-01

Full Text Available Solutions using information technology is an innovative way to manage the information hospice patients in hospitals in Panama. The application of techniques of text mining for the domain of medicine especially information from electronic health records of patients in palliative care is one of the most recent and promising research areas for the analysis of textual data. Text mining is based on new knowledge extraction from unstructured natural language data. We may also create ontologies to describe the terminology and knowledge in a given domain. In an ontology conceptualization of a domain that may be general or specific formalized. Knowledge can be used for decision making by health specialists or can help in research topics for improving the health system.
Text mining for literature review and knowledge discovery in cancer risk assessment and research.

Directory of Open Access Journals (Sweden)

Anna Korhonen

Full Text Available Research in biomedical text mining is starting to produce technology which can make information in biomedical literature more accessible for bio-scientists. One of the current challenges is to integrate and refine this technology to support real-life scientific tasks in biomedicine, and to evaluate its usefulness in the context of such tasks. We describe CRAB - a fully integrated text mining tool designed to support chemical health risk assessment. This task is complex and time-consuming, requiring a thorough review of existing scientific data on a particular chemical. Covering human, animal, cellular and other mechanistic data from various fields of biomedicine, this is highly varied and therefore difficult to harvest from literature databases via manual means. Our tool automates the process by extracting relevant scientific data in published literature and classifying it according to multiple qualitative dimensions. Developed in close collaboration with risk assessors, the tool allows navigating the classified dataset in various ways and sharing the data with other users. We present a direct and user-based evaluation which shows that the technology integrated in the tool is highly accurate, and report a number of case studies which demonstrate how the tool can be used to support scientific discovery in cancer risk assessment and research. Our work demonstrates the usefulness of a text mining pipeline in facilitating complex research tasks in biomedicine. We discuss further development and application of our technology to other types of chemical risk assessment in the future.
NWTS program criteria for mined geologic disposal of nuclear waste: repository performance and development criteria. Public draft

Energy Technology Data Exchange (ETDEWEB)

None

1982-07-01

This document, DOE/NWTS-33(3) is one of a series of documents to establish the National Waste Terminal Storage (NWTS) program criteria for mined geologic disposal of high-level radioactive waste. For both repository performance and repository development it delineates the criteria for design performance, radiological safety, mining safety, long-term containment and isolation, operations, and decommissioning. The US Department of Energy will use these criteria to guide the development of repositories to assist in achieving performance and will reevaluate their use when the US Nuclear Regulatory Commission issues radioactive waste repository rules.
NWTS program criteria for mined geologic disposal of nuclear waste: repository performance and development criteria. Public draft

International Nuclear Information System (INIS)

1982-07-01

This document, DOE/NWTS-33(3) is one of a series of documents to establish the National Waste Terminal Storage (NWTS) program criteria for mined geologic disposal of high-level radioactive waste. For both repository performance and repository development it delineates the criteria for design performance, radiological safety, mining safety, long-term containment and isolation, operations, and decommissioning. The US Department of Energy will use these criteria to guide the development of repositories to assist in achieving performance and will reevaluate their use when the US Nuclear Regulatory Commission issues radioactive waste repository rules
Web services-based text-mining demonstrates broad impacts for interoperability and process simplification.

Science.gov (United States)

Wiegers, Thomas C; Davis, Allan Peter; Mattingly, Carolyn J

2014-01-01

The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and
Decommissioning of facilities for mining and milling or radioactive ores and closeout of residues

International Nuclear Information System (INIS)

1994-01-01

The purpose of this report is to provide information to Member States in order to assist in planning and implementing the decommissioning/closeout of uranium mine/mill facilities, mines, tailings impoundments, mining debris piles, leach residues and unprocessed ore stockpiles. The report presents an overview of the factors involved in planning and implementing the decommissioning/closeout of uranium mine/mill facilities. The information applies to mines, mills, tailings piles, mining debris piles and leach residues that are present as operational, mothballed or abandoned projects, as well as to future mining and milling projects. The report identifies the major factors that need to be considered in the decommissioning/closeout activities, including regulatory considerations; decommissioning of the mine/mill buildings, structures and facilities; decommissioning/closeout of open pit and underground mines; decommissioning/closeout of tailings impoundments; decommissioning/closeout of mining debris piles, unprocessed ore and other contaminated material such as heap leach piles, in situe leach facilities and contaminated soils; restoration of the site, vicinity properties and groundwater; radiation protection and health and safety considerations; and an assessment of costs and post-decommissioning or post-closeout maintenance and monitoring needs. 55 refs, figs and tabs
Mining for associations between text and brain activation in a functional neuroimaging database

DEFF Research Database (Denmark)

Nielsen, Finn Årup; Hansen, Lars Kai; Balslev, D.

2004-01-01

We describe a method for mining a neuroimaging database for associations between text and brain locations. The objective is to discover association rules between words indicative of cognitive function as described in abstracts of neuroscience papers and sets of reported stereotactic Talairach...
Text mining analysis of public comments regarding high-level radioactive waste disposal

International Nuclear Information System (INIS)

Kugo, Akihide; Yoshikawa, Hidekazu; Shimoda, Hiroshi; Wakabayashi, Yasunaga

2005-01-01

In order to narrow the risk perception gap as seen in social investigations between the general public and people who are involved in nuclear industry, public comments on high-level radioactive waste (HLW) disposal have been conducted to find the significant talking points with the general public for constructing an effective risk communication model of social risk information regarding HLW disposal. Text mining was introduced to examine public comments to identify the core public interest underlying the comments. The utilized test mining method is to cluster specific groups of words with negative meanings and then to analyze public understanding by employing text structural analysis to extract words from subjective expressions. Using these procedures, it was found that the public does not trust the nuclear fuel cycle promotion policy and shows signs of anxiety about the long-lasting technological reliability of waste storage. To develop effective social risk communication of HLW issues, these findings are expected to help experts in the nuclear industry to communicate with the general public more effectively to obtain their trust. (author)
Beating Obesity: Factors Associated with Interest in Workplace Weight Management Assistance in the Mining Industry.

Science.gov (United States)

Street, Tamara D; Thomas, Drew L

2017-03-01

Rates of overweight and obese Australians are high and continue to rise, putting a large proportion of the population at risk of chronic illness. Examining characteristics associated with preference for a work-based weight-loss program will enable employers to better target programs to increase enrolment and benefit employees' health and fitness for work. A cross-sectional survey was undertaken at two Australian mining sites. The survey collected information on employee demographics, health characteristics, work characteristics, stages of behavior change, and preference for workplace assistance with reaching a healthy weight. A total of 897 employees participated; 73.7% were male, and 68% had a body mass index in the overweight or obese range. Employees at risk of developing obesity-related chronic illnesses (based on high body mass index) were more likely to report preference for weight management assistance than lower risk employees. This indicates that, even in the absence of workplace promotion for weight management, some at risk employees want workplace assistance. Employees who were not aware of a need to change their current nutrition or physical activity behaviors were less likely to seek assistance. This indicates that practitioners need to communicate the negative effects of excess weight and promote the benefits of a healthy lifestyle to increase the likelihood of weight management. Weight management programs should provide information, motivation. and trouble-shooting assistance to meet the needs of at-risk mining employees, including those who are attempting to change and maintain behaviors to achieve a healthy weight and be suitably fit for work.
From university research to innovation: Detecting knowledge transfer via text mining

Energy Technology Data Exchange (ETDEWEB)

Woltmann, S.; Clemmensen, L.; Alkærsig, L

2016-07-01

Knowledge transfer by universities is a top priority in innovation policy and a primary purpose for public research funding, due to being an important driver of technical change and innovation. Current empirical research on the impact of university research relies mainly on formal databases and indicators such as patents, collaborative publications and license agreements, to assess the contribution to the socioeconomic surrounding of universities. In this study, we present an extension of the current empirical framework by applying new computational methods, namely text mining and pattern recognition. Text samples for this purpose can include files containing social media contents, company websites and annual reports. The empirical focus in the present study is on the technical sciences and in particular on the case of the Technical University of Denmark (DTU). We generated two independent text collections (corpora) to identify correlations of university publications and company webpages. One corpus representing the company sites, serving as sample of the private economy and a second corpus, providing the reference to the university research, containing relevant publications. We associated the former with the latter to obtain insights into possible text and semantic relatedness. The text mining methods are extrapolating the correlations, semantic patterns and content comparison of the two corpora to define the document relatedness. We expect the development of a novel tool using contemporary techniques for the measurement of public research impact. The approach aims to be applicable across universities and thus enable a more holistic comparable assessment. This rely less on formal databases, which is certainly beneficial in terms of the data reliability. We seek to provide a supplementary perspective for the detection of the dissemination of university research and hereby enable policy makers to gain additional insights of (informal) contributions of knowledge
A cost comparison study of open pit mining vs. in situ assisted gravity drainage

International Nuclear Information System (INIS)

McIntosh, J.; Luhning, R.W.

1991-01-01

The twin-well steam assisted gravity drainage (SAGD) process has resulted in breakthrough technology to access previously uneconomical deep-seated oil sands reserves in Alberta, and to provide a very cost-effective and environmentally acceptable method for extracting bitumen from reserves having a minimum of 30 m overburden. In the evaluation of new or improved bitumen recovery technologies for its new North Mine, Syncrude Canada has recognized that SAGD was a potential alternate to the current open pit mining and hot water extraction process. A study was conducted to compare and evaluate bitumen recovery by the two schemes at the North Mine site, scheduled to begin operations in 1996, for the reserves under Syncrude's tailings pond, and at a new grassroots area. Study description and analysis of results are presented for the grassroots case. The assumptions and mining/recovery processes used for the mining or SAGD method are detailed and the advantages and drawbacks of each scheme are noted. Results show that the SAGD unit supply costs are projected to be proportionately lower than the corresponding open pit mining/hot water extraction (OP/X) cost, using a 20-y project life. A sensitivity analysis indicates that the SAGD process is more sensitive to natural gas costs, while the OP/X scheme is more sensitive to power costs. The SAGD process is much less labor-intensive than OP/X and has obvious advantages in terms of tailings disposal and post-mining reclamation. In addition, the underground nature of SAGD operation eliminates adverse effects of the weather on working conditions. 11 figs
Weighted mining of massive collections of [Formula: see text]-values by convex optimization.

Science.gov (United States)

Dobriban, Edgar

2018-06-01

Researchers in data-rich disciplines-think of computational genomics and observational cosmology-often wish to mine large bodies of [Formula: see text]-values looking for significant effects, while controlling the false discovery rate or family-wise error rate. Increasingly, researchers also wish to prioritize certain hypotheses, for example, those thought to have larger effect sizes, by upweighting, and to impose constraints on the underlying mining, such as monotonicity along a certain sequence. We introduce Princessp , a principled method for performing weighted multiple testing by constrained convex optimization. Our method elegantly allows one to prioritize certain hypotheses through upweighting and to discount others through downweighting, while constraining the underlying weights involved in the mining process. When the [Formula: see text]-values derive from monotone likelihood ratio families such as the Gaussian means model, the new method allows exact solution of an important optimal weighting problem previously thought to be non-convex and computationally infeasible. Our method scales to massive data set sizes. We illustrate the applications of Princessp on a series of standard genomics data sets and offer comparisons with several previous 'standard' methods. Princessp offers both ease of operation and the ability to scale to extremely large problem sizes. The method is available as open-source software from github.com/dobriban/pvalue_weighting_matlab (accessed 11 October 2017).
Tracing Knowledge Transfer from Universities to Industry: A Text Mining Approach

DEFF Research Database (Denmark)

Woltmann, Sabrina; Alkærsig, Lars

2017-01-01

This paper identifies transferred knowledge between universities and the industry by proposing the use of a computational linguistic method. Current research on university-industry knowledge exchange relies often on formal databases and indicators such as patents, collaborative publications and l...... is the first step to enable the identification of common knowledge and knowledge transfer via text mining to increase its measurability....... and license agreements, to assess the contribution to the socioeconomic surrounding of universities. We, on the other hand, use the texts from university abstracts to identify university knowledge and compare them with texts from firm webpages. We use these text data to identify common key words and thereby...... identify overlapping contents among the texts. As method we use a well-established word ranking method from the field of information retrieval term frequency–inverse document frequency (TFIDF) to identify commonalities between texts from university. In examining the outcomes of the TFIDF statistic we find...
Identifying Understudied Nuclear Reactions by Text-mining the EXFOR Experimental Nuclear Reaction Library

Energy Technology Data Exchange (ETDEWEB)

Hirdt, J.A. [Department of Mathematics and Computer Science, St. Joseph' s College, Patchogue, NY 11772 (United States); Brown, D.A., E-mail: dbrown@bnl.gov [National Nuclear Data Center, Brookhaven National Laboratory, Upton, NY 11973-5000 (United States)

2016-01-15

The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of social networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.
Identifying Understudied Nuclear Reactions by Text-mining the EXFOR Experimental Nuclear Reaction Library

International Nuclear Information System (INIS)

Hirdt, J.A.; Brown, D.A.

2016-01-01

The EXFOR library contains the largest collection of experimental nuclear reaction data available as well as the data's bibliographic information and experimental details. We text-mined the REACTION and MONITOR fields of the ENTRYs in the EXFOR library in order to identify understudied reactions and quantities. Using the results of the text-mining, we created an undirected graph from the EXFOR datasets with each graph node representing a single reaction and quantity and graph links representing the various types of connections between these reactions and quantities. This graph is an abstract representation of the connections in EXFOR, similar to graphs of social networks, authorship networks, etc. We use various graph theoretical tools to identify important yet understudied reactions and quantities in EXFOR. Although we identified a few cross sections relevant for shielding applications and isotope production, mostly we identified charged particle fluence monitor cross sections. As a side effect of this work, we learn that our abstract graph is typical of other real-world graphs.
A systems biology approach to construct the gene regulatory network of systemic inflammation via microarray and databases mining

Directory of Open Access Journals (Sweden)

Lan Chung-Yu

2008-09-01

Full Text Available Abstract Background Inflammation is a hallmark of many human diseases. Elucidating the mechanisms underlying systemic inflammation has long been an important topic in basic and clinical research. When primary pathogenetic events remains unclear due to its immense complexity, construction and analysis of the gene regulatory network of inflammation at times becomes the best way to understand the detrimental effects of disease. However, it is difficult to recognize and evaluate relevant biological processes from the huge quantities of experimental data. It is hence appealing to find an algorithm which can generate a gene regulatory network of systemic inflammation from high-throughput genomic studies of human diseases. Such network will be essential for us to extract valuable information from the complex and chaotic network under diseased conditions. Results In this study, we construct a gene regulatory network of inflammation using data extracted from the Ensembl and JASPAR databases. We also integrate and apply a number of systematic algorithms like cross correlation threshold, maximum likelihood estimation method and Akaike Information Criterion (AIC on time-lapsed microarray data to refine the genome-wide transcriptional regulatory network in response to bacterial endotoxins in the context of dynamic activated genes, which are regulated by transcription factors (TFs such as NF-κB. This systematic approach is used to investigate the stochastic interaction represented by the dynamic leukocyte gene expression profiles of human subject exposed to an inflammatory stimulus (bacterial endotoxin. Based on the kinetic parameters of the dynamic gene regulatory network, we identify important properties (such as susceptibility to infection of the immune system, which may be useful for translational research. Finally, robustness of the inflammatory gene network is also inferred by analyzing the hubs and "weak ties" structures of the gene network
Text mining to decipher free-response consumer complaints: insights from the NHTSA vehicle owner's complaint database.

Science.gov (United States)

Ghazizadeh, Mahtab; McDonald, Anthony D; Lee, John D

2014-09-01

This study applies text mining to extract clusters of vehicle problems and associated trends from free-response data in the National Highway Traffic Safety Administration's vehicle owner's complaint database. As the automotive industry adopts new technologies, it is important to systematically assess the effect of these changes on traffic safety. Driving simulators, naturalistic driving data, and crash databases all contribute to a better understanding of how drivers respond to changing vehicle technology, but other approaches, such as automated analysis of incident reports, are needed. Free-response data from incidents representing two severity levels (fatal incidents and incidents involving injury) were analyzed using a text mining approach: latent semantic analysis (LSA). LSA and hierarchical clustering identified clusters of complaints for each severity level, which were compared and analyzed across time. Cluster analysis identified eight clusters of fatal incidents and six clusters of incidents involving injury. Comparisons showed that although the airbag clusters across the two severity levels have the same most frequent terms, the circumstances around the incidents differ. The time trends show clear increases in complaints surrounding the Ford/Firestone tire recall and the Toyota unintended acceleration recall. Increases in complaints may be partially driven by these recall announcements and the associated media attention. Text mining can reveal useful information from free-response databases that would otherwise be prohibitively time-consuming and difficult to summarize manually. Text mining can extend human analysis capabilities for large free-response databases to support earlier detection of problems and more timely safety interventions.

MinePath: Mining for Phenotype Differential Sub-paths in Molecular Pathways

Science.gov (United States)

Koumakis, Lefteris; Kartsaki, Evgenia; Chatzimina, Maria; Zervakis, Michalis; Vassou, Despoina; Marias, Kostas; Moustakis, Vassilis; Potamias, George

2016-01-01

Pathway analysis methodologies couple traditional gene expression analysis with knowledge encoded in established molecular pathway networks, offering a promising approach towards the biological interpretation of phenotype differentiating genes. Early pathway analysis methodologies, named as gene set analysis (GSA), view pathways just as plain lists of genes without taking into account either the underlying pathway network topology or the involved gene regulatory relations. These approaches, even if they achieve computational efficiency and simplicity, consider pathways that involve the same genes as equivalent in terms of their gene enrichment characteristics. Most recent pathway analysis approaches take into account the underlying gene regulatory relations by examining their consistency with gene expression profiles and computing a score for each profile. Even with this approach, assessing and scoring single-relations limits the ability to reveal key gene regulation mechanisms hidden in longer pathway sub-paths. We introduce MinePath, a pathway analysis methodology that addresses and overcomes the aforementioned problems. MinePath facilitates the decomposition of pathways into their constituent sub-paths. Decomposition leads to the transformation of single-relations to complex regulation sub-paths. Regulation sub-paths are then matched with gene expression sample profiles in order to evaluate their functional status and to assess phenotype differential power. Assessment of differential power supports the identification of the most discriminant profiles. In addition, MinePath assess the significance of the pathways as a whole, ranking them by their p-values. Comparison results with state-of-the-art pathway analysis systems are indicative for the soundness and reliability of the MinePath approach. In contrast with many pathway analysis tools, MinePath is a web-based system (www.minepath.org) offering dynamic and rich pathway visualization functionality, with the
The contribution of the vaccine adverse event text mining system to the classification of possible Guillain-Barré syndrome reports.

Science.gov (United States)

Botsis, T; Woo, E J; Ball, R

2013-01-01

We previously demonstrated that a general purpose text mining system, the Vaccine adverse event Text Mining (VaeTM) system, could be used to automatically classify reports of an-aphylaxis for post-marketing safety surveillance of vaccines. To evaluate the ability of VaeTM to classify reports to the Vaccine Adverse Event Reporting System (VAERS) of possible Guillain-Barré Syndrome (GBS). We used VaeTM to extract the key diagnostic features from the text of reports in VAERS. Then, we applied the Brighton Collaboration (BC) case definition for GBS, and an information retrieval strategy (i.e. the vector space model) to quantify the specific information that is included in the key features extracted by VaeTM and compared it with the encoded information that is already stored in VAERS as Medical Dictionary for Regulatory Activities (MedDRA) Preferred Terms (PTs). We also evaluated the contribution of the primary (diagnosis and cause of death) and secondary (second level diagnosis and symptoms) diagnostic VaeTM-based features to the total VaeTM-based information. MedDRA captured more information and better supported the classification of reports for GBS than VaeTM (AUC: 0.904 vs. 0.777); the lower performance of VaeTM is likely due to the lack of extraction by VaeTM of specific laboratory results that are included in the BC criteria for GBS. On the other hand, the VaeTM-based classification exhibited greater specificity than the MedDRA-based approach (94.96% vs. 87.65%). Most of the VaeTM-based information was contained in the secondary diagnostic features. For GBS, clinical signs and symptoms alone are not sufficient to match MedDRA coding for purposes of case classification, but are preferred if specificity is the priority.
The Contribution of the Vaccine Adverse Event Text Mining System to the Classification of Possible Guillain-Barré Syndrome Reports

Science.gov (United States)

Botsis, T.; Woo, E. J.; Ball, R.

2013-01-01

Background We previously demonstrated that a general purpose text mining system, the Vaccine adverse event Text Mining (VaeTM) system, could be used to automatically classify reports of an-aphylaxis for post-marketing safety surveillance of vaccines. Objective To evaluate the ability of VaeTM to classify reports to the Vaccine Adverse Event Reporting System (VAERS) of possible Guillain-Barré Syndrome (GBS). Methods We used VaeTM to extract the key diagnostic features from the text of reports in VAERS. Then, we applied the Brighton Collaboration (BC) case definition for GBS, and an information retrieval strategy (i.e. the vector space model) to quantify the specific information that is included in the key features extracted by VaeTM and compared it with the encoded information that is already stored in VAERS as Medical Dictionary for Regulatory Activities (MedDRA) Preferred Terms (PTs). We also evaluated the contribution of the primary (diagnosis and cause of death) and secondary (second level diagnosis and symptoms) diagnostic VaeTM-based features to the total VaeTM-based information. Results MedDRA captured more information and better supported the classification of reports for GBS than VaeTM (AUC: 0.904 vs. 0.777); the lower performance of VaeTM is likely due to the lack of extraction by VaeTM of specific laboratory results that are included in the BC criteria for GBS. On the other hand, the VaeTM-based classification exhibited greater specificity than the MedDRA-based approach (94.96% vs. 87.65%). Most of the VaeTM-based information was contained in the secondary diagnostic features. Conclusion For GBS, clinical signs and symptoms alone are not sufficient to match MedDRA coding for purposes of case classification, but are preferred if specificity is the priority. PMID:23650490
Classifying unstructed textual data using the Product Score Model: an alternative text mining algorithm

NARCIS (Netherlands)

He, Qiwei; Veldkamp, Bernard P.; Eggen, T.J.H.M.; Veldkamp, B.P.

2012-01-01

Unstructured textual data such as students’ essays and life narratives can provide helpful information in educational and psychological measurement, but often contain irregularities and ambiguities, which creates difficulties in analysis. Text mining techniques that seek to extract useful
An Enhanced Text-Mining Framework for Extracting Disaster Relevant Data through Social Media and Remote Sensing Data Fusion

Science.gov (United States)

Scheele, C. J.; Huang, Q.

2016-12-01

In the past decade, the rise in social media has led to the development of a vast number of social media services and applications. Disaster management represents one of such applications leveraging massive data generated for event detection, response, and recovery. In order to find disaster relevant social media data, current approaches utilize natural language processing (NLP) methods based on keywords, or machine learning algorithms relying on text only. However, these approaches cannot be perfectly accurate due to the variability and uncertainty in language used on social media. To improve current methods, the enhanced text-mining framework is proposed to incorporate location information from social media and authoritative remote sensing datasets for detecting disaster relevant social media posts, which are determined by assessing the textual content using common text mining methods and how the post relates spatiotemporally to the disaster event. To assess the framework, geo-tagged Tweets were collected for three different spatial and temporal disaster events: hurricane, flood, and tornado. Remote sensing data and products for each event were then collected using RealEarthTM. Both Naive Bayes and Logistic Regression classifiers were used to compare the accuracy within the enhanced text-mining framework. Finally, the accuracies from the enhanced text-mining framework were compared to the current text-only methods for each of the case study disaster events. The results from this study address the need for more authoritative data when using social media in disaster management applications.
Mining and Reclamation Technology Symposium

Energy Technology Data Exchange (ETDEWEB)

None Available

1999-06-24

The Mining and Reclamation Technology Symposium was commissioned by the Mountaintop Removal Mining/Valley Fill Environmental Impact Statement (EIS) Interagency Steering Committee as an educational forum for the members of the regulatory community who will participate in the development of the EIS. The Steering Committee sought a balanced audience to ensure the input to the regulatory community reflected the range of perspectives on this complicated and emotional issue. The focus of this symposium is on mining and reclamation technology alternatives, which is one of eleven topics scheduled for review to support development of the EIS. Others include hydrologic, environmental, ecological, and socio-economic issues.
[Exploring the clinical characters of Shugan Jieyu capsule through text mining].

Science.gov (United States)

Pu, Zheng-Ping; Xia, Jiang-Ming; Xie, Wei; He, Jin-Cai

2017-09-01

The study was main to explore the clinical characters of Shugan Jieyu capsule through text mining. The data sets of Shugan Jieyu capsule were downloaded from CMCC database by the method of literature retrieved from May 2009 to Jan 2016. Rules of Chinese medical patterns, diseases, symptoms and combination treatment were mined out by data slicing algorithm, and they were demonstrated in frequency tables and two dimension based network. Then totally 190 literature were recruited. The outcomess suggested that SC was most frequently correlated with liver Qi stagnation. Primary depression, depression due to brain disease, concomitant depression followed by physical diseases, concomitant depression followed by schizophrenia and functional dyspepsia were main diseases treated by Shugan Jieyu capsule. Symptoms like low mood, psychic anxiety, somatic anxiety and dysfunction of automatic nerve were mainy relieved bv Shugan Jieyu capsule.For combination treatment. Shugan Jieyu capsule was most commonly used with paroxetine, sertraline and fluoxetine. The research suggested that syndrome types and mining results of Shugan Jieyu capsule were almost the same as its instructions. Syndrome of malnutrition of heart spirit was the potential Chinese medical pattern of Shugan Jieyu capsule. Primary comorbid anxiety and depression, concomitant comorbid anxiety and depression followed by physical diseases, and postpartum depression were potential diseases treated by Shugan Jieyu capsule.For combination treatment, Shugan Jieyu capsule was most commonly used with paroxetine, sertraline and fluoxetine. Copyright© by the Chinese Pharmaceutical Association.
Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.

Science.gov (United States)

Ravikumar, Komandur Elayavilli; Wagholikar, Kavishwar B; Li, Dingcheng; Kocher, Jean-Pierre; Liu, Hongfang

2015-06-06

Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3% for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10% in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5%. Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating
Integrated Text Mining and Chemoinformatics Analysis Associates Diet to Health Benefit at Molecular Level

DEFF Research Database (Denmark)

Jensen, Kasper; Panagiotou, Gianni; Kouskoumvekaki, Irene

2014-01-01

, lipids and nutrients. In this work, we applied text mining and Naïve Bayes classification to assemble the knowledge space of food-phytochemical and food-disease associations, where we distinguish between disease prevention/amelioration and disease progression. We subsequently searched for frequently...
Uranium mining in the Canadian social environment in the eighties

International Nuclear Information System (INIS)

Dory, A.B.

1981-11-01

Factors considered by the author to be responsible for the image crisis being experienced by all types of mining are discussed. The additional problems introduced by the presence of radiation in uranium mining are detailed along with the associated regulatory concerns. The Canadian regulatory system as it pertains to uranium mining is outlined very generally, followed by the author's views on improving the image of both uranium mining and the nuclear industry as a whole
Information Retrieval and Text Mining Technologies for Chemistry.

Science.gov (United States)

Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália; Oyarzabal, Julen; Valencia, Alfonso

2017-06-28

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
HPIminer: A text mining system for building and visualizing human protein interaction networks and pathways.

Science.gov (United States)

Subramani, Suresh; Kalpana, Raja; Monickaraj, Pankaj Moses; Natarajan, Jeyakumar

2015-04-01

The knowledge on protein-protein interactions (PPI) and their related pathways are equally important to understand the biological functions of the living cell. Such information on human proteins is highly desirable to understand the mechanism of several diseases such as cancer, diabetes, and Alzheimer's disease. Because much of that information is buried in biomedical literature, an automated text mining system for visualizing human PPI and pathways is highly desirable. In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature. HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG. To our knowledge, HPIminer is the first system to build interaction networks from literature as well as curated databases. Further, the new interactions mined only from literature and not reported earlier in databases are highlighted as new. A comparative study with other similar tools shows that the resultant network is more informative and provides additional information on interacting proteins and their associated networks. Copyright © 2015 Elsevier Inc. All rights reserved.
pubmed. mineR: An R package with text-mining algorithms to ...

Indian Academy of Sciences (India)

2016-08-26

Aug 26, 2016 ... Three case studies are presented, namely, `Evolving role of diabetes educators', `Cancer risk assessment' and `Dynamic concepts on disease and comorbidity' to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus ...
The WONP-NURT corpus as nuclear knowledge base for text mining in the INIS database

International Nuclear Information System (INIS)

Guerra Valdes, R.

2011-01-01

In the present work the WONP-NURT corpus is taken as knowledge base for text mining in the INIS database. Main components of the information processing system, as well as computational methods for content analysis of INIS database record files are described. Results of the content analysis of the WONP-NURT corpus are reported. Furthermore, results of two comparative text mining studies in the INIS database are also shown. The first one explores 10 research areas in the more familiar nearest range of WONP-NURT corpus, while the second one surveys 15 regions in the more exotic far range. The results provide new elements to asses the significance of the WONP-NURT corpus in the context of the current state of nuclear science and technology research areas. (Author)
Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.

Science.gov (United States)

Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R

2015-12-01

This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. Copyright © 2015 Elsevier Inc. All rights reserved.
What Online Communities Can Tell Us About Electronic Cigarettes and Hookah Use: A Study Using Text Mining and Visualization Techniques

OpenAIRE

Chen, Annie T; Zhu, Shu-Hong; Conway, Mike

2015-01-01

© 2015 Journal of Medical Internet Research. Background: The rise in popularity of electronic cigarettes (e-cigarettes) and hookah over recent years has been accompanied by some confusion and uncertainty regarding the development of an appropriate regulatory response towards these emerging products. Mining online discussion content can lead to insights into people's experiences, which can in turn further our knowledge of how to address potential health implications. In this work, we take a no...
A Review of Mine Rescue Ensembles for Underground Coal Mining in the United States.

Science.gov (United States)

Kilinc, F Selcen; Monaghan, William D; Powell, Jeffrey B

The mining industry is among the top ten industries nationwide with high occupational injury and fatality rates, and mine rescue response may be considered one of the most hazardous activities in mining operations. In the aftermath of an underground mine fire, explosion or water inundation, specially equipped and trained teams have been sent underground to fight fires, rescue entrapped miners, test atmospheric conditions, investigate the causes of the disaster, or recover the dead. Special personal protective ensembles are used by the team members to improve the protection of rescuers against the hazards of mine rescue and recovery. Personal protective ensembles used by mine rescue teams consist of helmet, cap lamp, hood, gloves, protective clothing, boots, kneepads, facemask, breathing apparatus, belt, and suspenders. While improved technology such as wireless warning and communication systems, lifeline pulleys, and lighted vests have been developed for mine rescuers over the last 100 years, recent research in this area of personal protective ensembles has been minimal due to the trending of reduced exposure of rescue workers. In recent years, the exposure of mine rescue teams to hazardous situations has been changing. However, it is vital that members of the teams have the capability and proper protection to immediately respond to a wide range of hazardous situations. Currently, there are no minimum requirements, best practice documents, or nationally recognized consensus standards for protective clothing used by mine rescue teams in the United States (U.S.). The following review provides a summary of potential issues that can be addressed by rescue teams and industry to improve potential exposures to rescue team members should a disaster situation occur. However, the continued trending in the mining industry toward non-exposure to potential hazards for rescue workers should continue to be the primary goal. To assist in continuing this trend, the mining industry
Legislative and Regulatory Control for the Safety of Radioactively Contaminated Scrap Metals Generated from Mining and Mineral Processing Facilities in South Africa

Energy Technology Data Exchange (ETDEWEB)

Mohajane, E. P.; Shale, K., E-mail: PEMohajane@nnr.co.za [National Nuclear Regulator, Centurion, Gauteng (South Africa)

2011-07-15

In South Africa, enhanced levels of naturally occurring radioactive materials (NORM) are associated with many mining and industrial processes. Significant amounts of waste materials are involved which can result in radiation exposure of the workers and the public particularly through the diversion of materials into the public domain. The following operations have been regulated in South Africa for the past twenty years: operating metallurgical plants utilizing NORM, underground mining operations, scrap recyclers and smelters, and rehabilitation and remediation activities involving the above sites. The radioactively contaminated scrap metal generated from the above mentioned facilities is available for recycling in amounts of thousands of tons. The South African government has, to a certain extent, responded to the above-mentioned challenges by introducing regulatory controls to the affected industries. The existing regulatory controls have, however, not provided absolute answers to all issues associated with the management of scrap. (author)
75 FR 48525 - Pennsylvania Regulatory Program

Science.gov (United States)

2010-08-10

... maintain jurisdiction of the regulatory program under the Federal Surface Mining Control and Reclamation... Part IV Department of the Interior Office of Surface Mining Reclamation and Enforcement 30 CFR... Surface Mining Reclamation and Enforcement 30 CFR Part 938 [PA-153; Docket ID OSM-2008-0021] Pennsylvania...
Accounting for water in the minerals industry: Capitalising on regulatory reporting

Directory of Open Access Journals (Sweden)

Rikki A. Garstone

2017-12-01

Full Text Available Australia has been rapidly advancing the field of water accounting as a tool to improve water management across the country. Water accounting is the application of a consistent and structured approach to identify, measure and report water resource information. The Bureau of Meteorology (the Bureau has developed Australian Water Accounting Standards for General Purpose Water Accounting Reports.Following collaboration between the Bureau and the Newmarket Gold Mining Company, this paper investigates how General Purpose Water Account Reporting can be applied and used in the minerals industry to simplify and improve aspects of regulatory reporting. This case study demonstrates how General Purpose Water Accounting Reports and the lessons learned from the ongoing development of the Australian National Water Account can be practically applied to regulatory reporting and corporate data management for a mining operation in the Australian Northern Territory. This paper also demonstrates the benefits of aligning a standardised water account with data that is already routinely collected as part of mining operations environmental compliance.

European Union International Cooperation to Improve Regulatory Effectiveness in Nuclear Safety

International Nuclear Information System (INIS)

Stockmann, Y.

2016-01-01

The European Union (EU) promotes a high level of nuclear safety worldwide, through the ''Instrument for Nuclear Safety Cooperation'' (INSC) since 2007. The INSC builds on the experience gained under the completed ''Technical Assistance to the Commonwealth of Independent States'' Programme (TACIS) from 1991. Development and strengthening of national Regulatory Authorities’ capabilities is a key activity in achieving the INSC goals, in particular in countries with or embarking on nuclear power. Specific partner countries under INSC include countries of all types of maturity in the nuclear technology, with mature countries such as Brazil, Mexico and Ukraine, countries with waste and mining issues, but no direct intention of embarking on nuclear power such as Georgia, Mongolia, Tajikistan, Kyrgyzstan and Tanzania and countries planning to embark on nuclear power such as Belarus, Egypt, Jordan and Vietnam. For new projects, the main focus is on the neighbourhood of the EU. The EU cooperation within INSC encompasses measures to support the promotion of high standards in radiation protection, radioactive waste management, decommissioning, remediation of contaminated sites, and efficient and effective safeguards of nuclear material. The INSC regulatory support is aimed at continuous assistance to Nuclear Regulatory Authorities (NRAs), including their technical support organisations (TSOs), in order to reinforce the regulatory framework, notably concerning licensing activities.
Biodiversity loss from deep-sea mining

OpenAIRE

C. L. Van Dover; J. A. Ardron; E. Escobar; M. Gianni; K. M. Gjerde; A. Jaeckel; D. O. B. Jones; L. A. Levin; H. Niner; L. Pendleton; C. R. Smith; T. Thiele; P. J. Turner; L. Watling; P. P. E. Weaver

2017-01-01

The emerging deep-sea mining industry is seen by some to be an engine for economic development in the maritime sector. The International Seabed Authority (ISA) – the body that regulates mining activities on the seabed beyond national jurisdiction – must also protect the marine environment from harmful effects that arise from mining. The ISA is currently drafting a regulatory framework for deep-sea mining that includes measures for environmental protection. Responsible mining increasingly stri...
Literature Mining Methods for Toxicology and Construction of ...

Science.gov (United States)

Webinar Presentation on text-mining methodologies in use at NCCT and how they can be used to assist with the OECD Retinoid project. Presentation to 1st Workshop/Scientific Expert Group meeting on the OECD Retinoid Project - April 26, 2016 –Brussels, Presented remotely via web.
Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends.

Science.gov (United States)

Jurca, Gabriela; Addam, Omar; Aksac, Alper; Gao, Shang; Özyer, Tansel; Demetrick, Douglas; Alhajj, Reda

2016-04-26

Breast cancer is a serious disease which affects many women and may lead to death. It has received considerable attention from the research community. Thus, biomedical researchers aim to find genetic biomarkers indicative of the disease. Novel biomarkers can be elucidated from the existing literature. However, the vast amount of scientific publications on breast cancer make this a daunting task. This paper presents a framework which investigates existing literature data for informative discoveries. It integrates text mining and social network analysis in order to identify new potential biomarkers for breast cancer. We utilized PubMed for the testing. We investigated gene-gene interactions, as well as novel interactions such as gene-year, gene-country, and abstract-country to find out how the discoveries varied over time and how overlapping/diverse are the discoveries and the interest of various research groups in different countries. Interesting trends have been identified and discussed, e.g., different genes are highlighted in relationship to different countries though the various genes were found to share functionality. Some text analysis based results have been validated against results from other tools that predict gene-gene relations and gene functions.
30 CFR 785.14 - Mountaintop removal mining.

Science.gov (United States)

2010-07-01

... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Mountaintop removal mining. 785.14 Section 785.14 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR SURFACE COAL MINING AND RECLAMATION OPERATIONS PERMITS AND COAL EXPLORATION SYSTEMS UNDER REGULATORY...
SWIFT-Review: a text-mining workbench for systematic review.

Science.gov (United States)

Howard, Brian E; Phillips, Jason; Miller, Kyle; Tandon, Arpit; Mav, Deepak; Shah, Mihir R; Holmgren, Stephanie; Pelch, Katherine E; Walker, Vickie; Rooney, Andrew A; Macleod, Malcolm; Shah, Ruchir R; Thayer, Kristina

2016-05-23

effort ordinarily required when using un-ordered document lists. In addition, the tagging and annotation capabilities of SWIFT-Review can be useful during the activities of scoping and problem formulation. Text-mining and machine learning software such as SWIFT-Review can be valuable tools to reduce the human screening burden and assist in problem formulation.
Towards Technological Approaches for Concept Maps Mining from Text

Directory of Open Access Journals (Sweden)

Camila Zacche Aguiar

2018-04-01

Full Text Available Concept maps are resources for the representation and construction of knowledge. They allow showing, through concepts and relationships, how knowledge about a subject is organized. Technological advances have boosted the development of approaches for the automatic construction of a concept map, to facilitate and provide the benefits of that resource more broadly. Due to the need to better identify and analyze the functionalities and characteristics of those approaches, we conducted a detailed study on technological approaches for automatic construction of concept maps published between 1994 and 2016 in the IEEE Xplore, ACM and Elsevier Science Direct data bases. From this study, we elaborate a categorization defined on two perspectives, Data Source and Graphic Representation, and fourteen categories. That study collected 30 relevant articles, which were applied to the proposed categorization to identify the main features and limitations of each approach. A detailed view on these approaches, their characteristics and techniques are presented enabling a quantitative analysis. In addition, the categorization has given us objective conditions to establish new specification requirements for a new technological approach aiming at concept maps mining from texts.
30 CFR 785.15 - Steep slope mining.

Science.gov (United States)

2010-07-01

... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Steep slope mining. 785.15 Section 785.15 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR SURFACE COAL MINING AND RECLAMATION OPERATIONS PERMITS AND COAL EXPLORATION SYSTEMS UNDER REGULATORY PROGRAMS...
Stopping Antidepressants and Anxiolytics as Major Concerns Reported in Online Health Communities: A Text Mining Approach.

Science.gov (United States)

Abbe, Adeline; Falissard, Bruno

2017-10-23

Internet is a particularly dynamic way to quickly capture the perceptions of a population in real time. Complementary to traditional face-to-face communication, online social networks help patients to improve self-esteem and self-help. The aim of this study was to use text mining on material from an online forum exploring patients' concerns about treatment (antidepressants and anxiolytics). Concerns about treatment were collected from discussion titles in patients' online community related to antidepressants and anxiolytics. To examine the content of these titles automatically, we used text mining methods, such as word frequency in a document-term matrix and co-occurrence of words using a network analysis. It was thus possible to identify topics discussed on the forum. The forum included 2415 discussions on antidepressants and anxiolytics over a period of 3 years. After a preprocessing step, the text mining algorithm identified the 99 most frequently occurring words in titles, among which were escitalopram, withdrawal, antidepressant, venlafaxine, paroxetine, and effect. Patients' concerns were related to antidepressant withdrawal, the need to share experience about symptoms, effects, and questions on weight gain with some drugs. Patients' expression on the Internet is a potential additional resource in addressing patients' concerns about treatment. Patient profiles are close to that of patients treated in psychiatry. ©Adeline Abbe, Bruno Falissard. Originally published in JMIR Mental Health (http://mental.jmir.org), 23.10.2017.
Analysis of Nature of Science Included in Recent Popular Writing Using Text Mining Techniques

Science.gov (United States)

Jiang, Feng; McComas, William F.

2014-01-01

This study examined the inclusion of nature of science (NOS) in popular science writing to determine whether it could serve supplementary resource for teaching NOS and to evaluate the accuracy of text mining and classification as a viable research tool in science education research. Four groups of documents published from 2001 to 2010 were…
Mining toward the year 2000

International Nuclear Information System (INIS)

Anon.

1981-01-01

Mining in South Africa to this present day has not been a case of dramatic development, rather a steady technical progress, assisted by a rising product market price. Prominent men in the mining industry look at the future in terms of that logical development. Coverage is given to gold, mine unionization, coal, rock bursts, ventilation, uranium and ocean mining
PROGRAMS WITH DATA MINING CAPABILITIES

Directory of Open Access Journals (Sweden)

Ciobanu Dumitru

2012-03-01

Full Text Available The fact that the Internet has become a commodity in the world has created a framework for anew economy. Traditional businesses migrate to this new environment that offers many features and options atrelatively low prices. However competitiveness is fierce and successful Internet business is tied to rigorous use of allavailable information. The information is often hidden in data and for their retrieval is necessary to use softwarecapable of applying data mining algorithms and techniques. In this paper we want to review some of the programswith data mining capabilities currently available in this area.We also propose some classifications of this softwareto assist those who wish to use such software.
Influence of ergonomic design on trackless mining machines on the health and safety of the operators, drivers and workers.

CSIR Research Space (South Africa)

Mason, S

1998-07-01

Full Text Available The project has produced information and methodologies for use by designers, mine managers and engineers to improve the health and safety associated with the use of trackless vehicles in mines. The project deliverables focus on assisting; designers...
Data mining of text as a tool in authorship attribution

Science.gov (United States)

Visa, Ari J. E.; Toivonen, Jarmo; Autio, Sami; Maekinen, Jarno; Back, Barbro; Vanharanta, Hannu

2001-03-01

It is common that text documents are characterized and classified by keywords that the authors use to give them. Visa et al. have developed a new methodology based on prototype matching. The prototype is an interesting document or a part of an extracted, interesting text. This prototype is matched with the document database of the monitored document flow. The new methodology is capable of extracting the meaning of the document in a certain degree. Our claim is that the new methodology is also capable of authenticating the authorship. To verify this claim two tests were designed. The test hypothesis was that the words and the word order in the sentences could authenticate the author. In the first test three authors were selected. The selected authors were William Shakespeare, Edgar Allan Poe, and George Bernard Shaw. Three texts from each author were examined. Every text was one by one used as a prototype. The two nearest matches with the prototype were noted. The second test uses the Reuters-21578 financial news database. A group of 25 short financial news reports from five different authors are examined. Our new methodology and the interesting results from the two tests are reported in this paper. In the first test, for Shakespeare and for Poe all cases were successful. For Shaw one text was confused with Poe. In the second test the Reuters-21578 financial news were identified by the author relatively well. The resolution is that our text mining methodology seems to be capable of authorship attribution.
Comparison between BIDE, PrefixSpan, and TRuleGrowth for Mining of Indonesian Text

Science.gov (United States)

Sa'adillah Maylawati, Dian; Irfan, Mohamad; Budiawan Zulfikar, Wildan

2017-01-01

Mining proscess for Indonesian language still be an interesting research. Multiple of words representation was claimed can keep the meaning of text better than bag of words. In this paper, we compare several sequential pattern algortihm, among others BIDE (BIDirectional Extention), PrefixSpan, and TRuleGrowth. All of those algorithm produce frequent word sequence to keep the meaning of text. However, the experiment result, with 14.006 of Indonesian tweet from Twitter, shows that BIDE can produce more efficient frequent word sequence than PrefixSpan and TRuleGrowth without missing the meaning of text. Then, the average of time process of PrefixSpan is faster than BIDE and TRuleGrowth. In the other hand, PrefixSpan and TRuleGrowth is more efficient in using memory than BIDE.
The Cogemagazine reviews. The rehabilitation of mining sites in France

International Nuclear Information System (INIS)

Loriot, O.; Bof, M.; Villeneuve, A.

1998-02-01

The French uranium mines are progressively closing down. After a mining division has closed down, the main objectives of the Cogema group are: ensuring the long-term safety and healthiness of the site, reducing the residual impacts, preventing any abusive intrusion, reducing the surface of land submitted to right-of-way, encouraging the reconversion of the site, and succeeding in the integration of the site in the landscape in agreement with the local authorities. This brochure presents the strategy followed by Cogema for the rehabilitation of his sites: the French mining concessions and the uranium extraction and processing techniques, the storage of tailings and processing residues, the environment protection and the respect of regulation (environmental surveillance, working groups, administrative procedures and regulatory texts, impact studies...), the backfilling and safety of underground mines, the cost studies for the rehabilitation of open cast mines, the dismantling of factories, the confinement of residues and the revegetation, the continuous monitoring of the rehabilitated sites (water, atmosphere, food..). (J.S.)
Appraising the Corporate Sustainability Reports - Text Mining and Multi-Discriminatory Analysis

Science.gov (United States)

Modapothala, J. R.; Issac, B.; Jayamani, E.

The voluntary disclosure of the sustainability reports by the companies attracts wider stakeholder groups. Diversity in these reports poses challenge to the users of information and regulators. This study appraises the corporate sustainability reports as per GRI (Global Reporting Initiative) guidelines (the most widely accepted and used) across all industrial sectors. Text mining is adopted to carry out the initial analysis with a large sample size of 2650 reports. Statistical analyses were performed for further investigation. The results indicate that the disclosures made by the companies differ across the industrial sectors. Multivariate Discriminant Analysis (MDA) shows that the environmental variable is a greater significant contributing factor towards explanation of sustainability report.
Text mining applications in psychiatry: a systematic literature review.

Science.gov (United States)

Abbe, Adeline; Grouin, Cyril; Zweigenbaum, Pierre; Falissard, Bruno

2016-06-01

The expansion of biomedical literature is creating the need for efficient tools to keep pace with increasing volumes of information. Text mining (TM) approaches are becoming essential to facilitate the automated extraction of useful biomedical information from unstructured text. We reviewed the applications of TM in psychiatry, and explored its advantages and limitations. A systematic review of the literature was carried out using the CINAHL, Medline, EMBASE, PsycINFO and Cochrane databases. In this review, 1103 papers were screened, and 38 were included as applications of TM in psychiatric research. Using TM and content analysis, we identified four major areas of application: (1) Psychopathology (i.e. observational studies focusing on mental illnesses) (2) the Patient perspective (i.e. patients' thoughts and opinions), (3) Medical records (i.e. safety issues, quality of care and description of treatments), and (4) Medical literature (i.e. identification of new scientific information in the literature). The information sources were qualitative studies, Internet postings, medical records and biomedical literature. Our work demonstrates that TM can contribute to complex research tasks in psychiatry. We discuss the benefits, limits, and further applications of this tool in the future. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Mining concepts of health responsibility using text mining and exploratory graph analysis.

Science.gov (United States)

Kjellström, Sofia; Golino, Hudson

2018-05-24

Occupational therapists need to know about people's beliefs about personal responsibility for health to help them pursue everyday activities. The study aims to employ state-of-the-art quantitative approaches to understand people's views of health and responsibility at different ages. A mixed method approach was adopted, using text mining to extract information from 233 interviews with participants aged 5 to 96 years, and then exploratory graph analysis to estimate the number of latent variables. The fit of the structure estimated via the exploratory graph analysis was verified using confirmatory factor analysis. Exploratory graph analysis estimated three dimensions of health responsibility: (1) creating good health habits and feeling good; (2) thinking about one's own health and wanting to improve it; and 3) adopting explicitly normative attitudes to take care of one's health. The comparison between the three dimensions among age groups showed, in general, that children and adolescents, as well as the old elderly (>73 years old) expressed ideas about personal responsibility for health less than young adults, adults and young elderly. Occupational therapists' knowledge of the concepts of health responsibility is of value when working with a patient's health, but an identified challenge is how to engage children and older persons.
30 CFR 75.302 - Main mine fans.

Science.gov (United States)

2010-07-01

... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Main mine fans. 75.302 Section 75.302 Mineral... SAFETY STANDARDS-UNDERGROUND COAL MINES Ventilation § 75.302 Main mine fans. Each coal mine shall be ventilated by one or more main mine fans. Booster fans shall not be installed underground to assist main mine...

Examining Mobile Learning Trends 2003-2008: A Categorical Meta-Trend Analysis Using Text Mining Techniques

Science.gov (United States)

Hung, Jui-Long; Zhang, Ke

2012-01-01

This study investigated the longitudinal trends of academic articles in Mobile Learning (ML) using text mining techniques. One hundred and nineteen (119) refereed journal articles and proceedings papers from the SCI/SSCI database were retrieved and analyzed. The taxonomies of ML publications were grouped into twelve clusters (topics) and four…
Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the comparative toxicogenomics database.

Directory of Open Access Journals (Sweden)

Allan Peter Davis

Full Text Available The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/ is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS, wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel. Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency.
U.S. Nuclear Regulatory Commission nuclear safety assistance to the CEE and NIS countries

International Nuclear Information System (INIS)

Blaha, J.

2001-01-01

NRC participates in bilateral and multilateral efforts to strengthen the regulatory authorities of countries in which Soviet design NPPs are operated. Countries involved are the New Independent States of the Soviet Union (Armenia, Kazakhstan, Russia and Ukraine) and of Central and Eastern Europe (Bulgaria, Czech Republic, Hungary, Lithuania and Slovak Republic). NRC's goal is to see that its counterparts receive the basic tools, knowledge and understanding needed to exercise effective regulatory oversight, consistent with internationally accepted norms and standards. The bilateral assistance started in 1991. $44 mill. are provided to the countries. The multilateral activities NRC participates in include: H-7 Nuclear Safety Working Group, EBRD - Administered Nuclear Safety Account and Chernobyl Sarcophagus Fund and IAEA
Analysis of Protein Phosphorylation and Its Functional Impact on Protein-Protein Interactions via Text Mining of the Scientific Literature.

Science.gov (United States)

Wang, Qinghua; Ross, Karen E; Huang, Hongzhan; Ren, Jia; Li, Gang; Vijay-Shanker, K; Wu, Cathy H; Arighi, Cecilia N

2017-01-01

Post-translational modifications (PTMs) are one of the main contributors to the diversity of proteoforms in the proteomic landscape. In particular, protein phosphorylation represents an essential regulatory mechanism that plays a role in many biological processes. Protein kinases, the enzymes catalyzing this reaction, are key participants in metabolic and signaling pathways. Their activation or inactivation dictate downstream events: what substrates are modified and their subsequent impact (e.g., activation state, localization, protein-protein interactions (PPIs)). The biomedical literature continues to be the main source of evidence for experimental information about protein phosphorylation. Automatic methods to bring together phosphorylation events and phosphorylation-dependent PPIs can help to summarize the current knowledge and to expose hidden connections. In this chapter, we demonstrate two text mining tools, RLIMS-P and eFIP, for the retrieval and extraction of kinase-substrate-site data and phosphorylation-dependent PPIs from the literature. These tools offer several advantages over a literature search in PubMed as their results are specific for phosphorylation. RLIMS-P and eFIP results can be sorted, organized, and viewed in multiple ways to answer relevant biological questions, and the protein mentions are linked to UniProt identifiers.
76 FR 6587 - Pennsylvania Regulatory Program

Science.gov (United States)

2011-02-07

... [PA-159-FOR; OSM 2010-0017] Pennsylvania Regulatory Program AGENCY: Office of Surface Mining... remove a required amendment to the Pennsylvania regulatory program (the ``Pennsylvania program'') under... program amendment codified in the Federal regulations, Pennsylvania has submitted information that it...
Trends of E-Learning Research from 2000 to 2008: Use of Text Mining and Bibliometrics

Science.gov (United States)

Hung, Jui-long

2012-01-01

This study investigated the longitudinal trends of e-learning research using text mining techniques. Six hundred and eighty-nine (689) refereed journal articles and proceedings were retrieved from the Science Citation Index/Social Science Citation Index database in the period from 2000 to 2008. All e-learning publications were grouped into two…
Seqenv: linking sequences to environments through text mining

Directory of Open Access Journals (Sweden)

Lucas Sinclair

2016-12-01

Full Text Available Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the “nt” nucleotide database provided by NCBI and, out of every hit, extracts–if it is available–the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography. To install seqenv, go to: https://github.com/xapple/seqenv.
Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

Science.gov (United States)

Singhal, Ayush; Simmons, Michael; Lu, Zhiyong

2016-11-01

The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient's genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer's disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease
Text mining of rheumatoid arthritis and diabetes mellitus to understand the mechanisms of Chinese medicine in different diseases with same treatment.

Science.gov (United States)

Zhao, Ning; Zheng, Guang; Li, Jian; Zhao, Hong-Yan; Lu, Cheng; Jiang, Miao; Zhang, Chi; Guo, Hong-Tao; Lu, Ai-Ping

2018-01-09

To identify the commonalities between rheumatoid arthritis (RA) and diabetes mellitus (DM) to understand the mechanisms of Chinese medicine (CM) in different diseases with the same treatment. A text mining approach was adopted to analyze the commonalities between RA and DM according to CM and biological elements. The major commonalities were subsequently verifified in RA and DM rat models, in which herbal formula for the treatment of both RA and DM identifified via text mining was used as the intervention. Similarities were identifified between RA and DM regarding the CM approach used for diagnosis and treatment, as well as the networks of biological activities affected by each disease, including the involvement of adhesion molecules, oxidative stress, cytokines, T-lymphocytes, apoptosis, and inflfl ammation. The Ramulus Cinnamomi-Radix Paeoniae Alba-Rhizoma Anemarrhenae is an herbal combination used to treat RA and DM. This formula demonstrated similar effects on oxidative stress and inflfl ammation in rats with collagen-induced arthritis, which supports the text mining results regarding the commonalities between RA and DM. Commonalities between the biological activities involved in RA and DM were identifified through text mining, and both RA and DM might be responsive to the same intervention at a specifific stage.
The hazardous nature of small scale underground mining in Ghana

Directory of Open Access Journals (Sweden)

K.J. Bansah

2016-01-01

Full Text Available Small scale mining continues to contribute significantly to the growth of Ghana's economy. However, the sector poses serious dangers to human health and the environment. Ground failures resulting from poorly supported stopes have led to injuries and fatalities in recent times. Dust and fumes from drilling and blasting of ore present health threats due to poor ventilation. Four prominent small scale underground mines were studied to identify the safety issues associated with small scale underground mining in Ghana. It is recognized that small scale underground mining in Ghana is inundated with unsafe acts and conditions including stope collapse, improper choice of working tools, absence of personal protective equipment and land degradation. Inadequate monitoring of the operations and lack of regulatory enforcement by the Minerals Commission of Ghana are major contributing factors to the environmental, safety and national security issues of the operations.
Data Mining of Acupoint Characteristics from the Classical Medical Text: DongUiBoGam of Korean Medicine

Directory of Open Access Journals (Sweden)

Taehyung Lee

2014-01-01

Full Text Available Throughout the history of East Asian medicine, different kinds of acupuncture treatment experiences have been accumulated in classical medical texts. Reexamining knowledge from classical medical texts is expected to provide meaningful information that could be utilized in current medical practices. In this study, we used data mining methods to analyze the association between acupoints and patterns of disorder with the classical medical book DongUiBoGam of Korean medicine. Using the term frequency-inverse document frequency (tf-idf method, we quantified the significance of acupoints to its targeting patterns and, conversely, the significance of patterns to acupoints. Through these processes, we extracted characteristics of each acupoint based on its treating patterns. We also drew practical information for selecting acupoints on certain patterns according to their association. Data analysis on DongUiBoGam’s acupuncture treatment gave us an insight into the main idea of DongUiBoGam. We strongly believe that our approach can provide a novel understanding of unknown characteristics of acupoint and pattern identification from the classical medical text using data mining methods.
A Study on Environmental Research Trends Using Text-Mining Method - Focus on Spatial information and ICT -

Science.gov (United States)

Lee, M. J.; Oh, K. Y.; Joung-ho, L.

2016-12-01

Recently there are many research about analysing the interaction between entities by text-mining analysis in various fields. In this paper, we aimed to quantitatively analyse research-trends in the area of environmental research relating either spatial information or ICT (Information and Communications Technology) by Text-mining analysis. To do this, we applied low-dimensional embedding method, clustering analysis, and association rule to find meaningful associative patterns of key words frequently appeared in the articles. As the authors suppose that KCI (Korea Citation Index) articles reflect academic demands, total 1228 KCI articles that have been published from 1996 to 2015 were reviewed and analysed by Text-mining method. First, we derived KCI articles from NDSL(National Discovery for Science Leaders) site. And then we pre-processed their key-words elected from abstract and then classified those in separable sectors. We investigated the appearance rates and association rule of key-words for articles in the two fields: spatial-information and ICT. In order to detect historic trends, analysis was conducted separately for the four periods: 1996-2000, 2001-2005, 2006-2010, 2011-2015. These analysis were conducted with the usage of R-software. As a result, we conformed that environmental research relating spatial information mainly focused upon such fields as `GIS(35%)', `Remote-Sensing(25%)', `environmental theme map(15.7%)'. Next, `ICT technology(23.6%)', `ICT service(5.4%)', `mobile(24%)', `big data(10%)', `AI(7%)' are primarily emerging from environmental research relating ICT. Thus, from the analysis results, this paper asserts that research trends and academic progresses are well-structured to review recent spatial information and ICT technology and the outcomes of the analysis can be an adequate guidelines to establish environment policies and strategies. KEY WORDS: Big data, Test-mining, Environmental research, Spatial-information, ICT Acknowledgements: The
Nuclear regulation of South African mines: An industry perspective

International Nuclear Information System (INIS)

Wymer, D.G.

2001-01-01

South African mines have become subject to a rigid and prescriptive system of nuclear regulation that has its roots in the past when South Africa embarked upon a period of nuclear development spanning the full nuclear fuel cycle, and in which the South African gold mining industry once played a major part in the supply of uranium as a low grade by-product. Radiation hazards in the mines are generally very moderate, even in the few gold mines associated with uranium by-product, and to not warrant the type of regulatory attention normally applied to nuclear installations, or even to uranium mines. The continued imposition of strict nuclear regulatory requirements has caused severe financial hardship and threatens the survival of certain mining operations, while seemingly having little or no health benefits to workers or the public. With the development of modern, comprehensive mine health and safety legislation, a more appropriate, effective, and far less costly vehicle for controlling radiation hazards in mines now exists, utilizing the resources of the Mine Health and Safety Inspectorate. This approach is now being proposed, in the drafting of new legislation, as constituting a better alternative to the nuclear regulation of mines. (author)
Alkemio: association of chemicals with biomedical topics by text and data mining.

Science.gov (United States)

Gijón-Correas, José A; Andrade-Navarro, Miguel A; Fontaine, Jean F

2014-07-01

The PubMed® database of biomedical citations allows the retrieval of scientific articles studying the function of chemicals in biology and medicine. Mining millions of available citations to search reported associations between chemicals and topics of interest would require substantial human time. We have implemented the Alkemio text mining web tool and SOAP web service to help in this task. The tool uses biomedical articles discussing chemicals (including drugs), predicts their relatedness to the query topic with a naïve Bayesian classifier and ranks all chemicals by P-values computed from random simulations. Benchmarks on seven human pathways showed good retrieval performance (areas under the receiver operating characteristic curves ranged from 73.6 to 94.5%). Comparison with existing tools to retrieve chemicals associated to eight diseases showed the higher precision and recall of Alkemio when considering the top 10 candidate chemicals. Alkemio is a high performing web tool ranking chemicals for any biomedical topics and it is free to non-commercial users. http://cbdm.mdc-berlin.de/∼medlineranker/cms/alkemio. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
A method for extracting design rationale knowledge based on Text Mining

Directory of Open Access Journals (Sweden)

Liu Jihong

2017-01-01

Full Text Available Capture design rationale (DR knowledge and presenting it to designers by good form, which have great significance for design reuse and design innovation. Since the 1970s design rationality began to develop, many teams have developed their own design rational system. However, the DR acquisition system is not intelligent enough, and it still requires designers to do a lot of operations. In addition, the existing design documents contain a large number of DR knowledge, but it has not been well excavated. Therefore, a method and system are needed to better extract DR knowledge in design documents. We have proposed a DRKH (design rationale knowledge hierarchy model for DR representation. The DRKH model has three layers, respectively as design intent layer, design decision layer and design basis layer. In this paper, we use text mining method to extract DR from design documents and construct DR model. Finally, the welding robot design specification is taken as an example to demonstrate the system interface.
Internet of Things in Health Trends Through Bibliometrics and Text Mining.

Science.gov (United States)

Konstantinidis, Stathis Th; Billis, Antonis; Wharrad, Heather; Bamidis, Panagiotis D

2017-01-01

Recently a new buzzword has slowly but surely emerged, namely the Internet of Things (IoT). The importance of IoT is identified worldwide both by organisations and governments and the scientific community with an incremental number of publications during the last few years. IoT in Health is one of the main pillars of this evolution, but limited research has been performed on future visions and trends. Thus, in this study we investigate the longitudinal trends of Internet of Things in Health through bibliometrics and use of text mining. Seven hundred seventy eight (778) articles were retrieved form The Web of Science database from 1998 to 2016. The publications are grouped into thirty (30) clusters based on abstract text analysis resulting into some eight (8) trends of IoT in Health. Research in this field is obviously obtaining a worldwide character with specific trends, which are worth delineating to be in favour of some areas.
30 CFR 817.81 - Coal mine waste: General requirements.

Science.gov (United States)

2010-07-01

... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Coal mine waste: General requirements. 817.81... ACTIVITIES § 817.81 Coal mine waste: General requirements. (a) General. All coal mine waste disposed of in an... within a permit area, which are approved by the regulatory authority for this purpose. Coal mine waste...
30 CFR 816.81 - Coal mine waste: General requirements.

Science.gov (United States)

2010-07-01

... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Coal mine waste: General requirements. 816.81... ACTIVITIES § 816.81 Coal mine waste: General requirements. (a) General. All coal mine waste disposed of in an... within a permit area, which are approved by the regulatory authority for this purpose. Coal mine waste...
Text mining-based in silico drug discovery in oral mucositis caused by high-dose cancer therapy.

Science.gov (United States)

Kirk, Jon; Shah, Nirav; Noll, Braxton; Stevens, Craig B; Lawler, Marshall; Mougeot, Farah B; Mougeot, Jean-Luc C

2018-08-01

Oral mucositis (OM) is a major dose-limiting side effect of chemotherapy and radiation used in cancer treatment. Due to the complex nature of OM, currently available drug-based treatments are of limited efficacy. Our objectives were (i) to determine genes and molecular pathways associated with OM and wound healing using computational tools and publicly available data and (ii) to identify drugs formulated for topical use targeting the relevant OM molecular pathways. OM and wound healing-associated genes were determined by text mining, and the intersection of the two gene sets was selected for gene ontology analysis using the GeneCodis program. Protein interaction network analysis was performed using STRING-db. Enriched gene sets belonging to the identified pathways were queried against the Drug-Gene Interaction database to find drug candidates for topical use in OM. Our analysis identified 447 genes common to both the "OM" and "wound healing" text mining concepts. Gene enrichment analysis yielded 20 genes representing six pathways and targetable by a total of 32 drugs which could possibly be formulated for topical application. A manual search on ClinicalTrials.gov confirmed no relevant pathway/drug candidate had been overlooked. Twenty-five of the 32 drugs can directly affect the PTGS2 (COX-2) pathway, the pathway that has been targeted in previous clinical trials with limited success. Drug discovery using in silico text mining and pathway analysis tools can facilitate the identification of existing drugs that have the potential of topical administration to improve OM treatment.
Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

Directory of Open Access Journals (Sweden)

Ayush Singhal

2016-11-01

Full Text Available The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient's genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed. Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer's disease, hemochromatosis, age-related macular degeneration (AMD, diabetes mellitus, and cystic fibrosis. We then evaluate our approach in two ways: (1 a direct comparison with the state of the art using benchmark datasets; (2 a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79 over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB, we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets

Finding novel relationships with integrated gene-gene association network analysis of Synechocystis sp. PCC 6803 using species-independent text-mining.

Science.gov (United States)

Kreula, Sanna M; Kaewphan, Suwisa; Ginter, Filip; Jones, Patrik R

2018-01-01

The increasing move towards open access full-text scientific literature enhances our ability to utilize advanced text-mining methods to construct information-rich networks that no human will be able to grasp simply from 'reading the literature'. The utility of text-mining for well-studied species is obvious though the utility for less studied species, or those with no prior track-record at all, is not clear. Here we present a concept for how advanced text-mining can be used to create information-rich networks even for less well studied species and apply it to generate an open-access gene-gene association network resource for Synechocystis sp. PCC 6803, a representative model organism for cyanobacteria and first case-study for the methodology. By merging the text-mining network with networks generated from species-specific experimental data, network integration was used to enhance the accuracy of predicting novel interactions that are biologically relevant. A rule-based algorithm (filter) was constructed in order to automate the search for novel candidate genes with a high degree of likely association to known target genes by (1) ignoring established relationships from the existing literature, as they are already 'known', and (2) demanding multiple independent evidences for every novel and potentially relevant relationship. Using selected case studies, we demonstrate the utility of the network resource and filter to ( i ) discover novel candidate associations between different genes or proteins in the network, and ( ii ) rapidly evaluate the potential role of any one particular gene or protein. The full network is provided as an open-source resource.
30 CFR 785.13 - Experimental practices mining.

Science.gov (United States)

2010-07-01

... practices granting variances from the special environmental protection performance standards of sections 515... INTERIOR SURFACE COAL MINING AND RECLAMATION OPERATIONS PERMITS AND COAL EXPLORATION SYSTEMS UNDER REGULATORY PROGRAMS REQUIREMENTS FOR PERMITS FOR SPECIAL CATEGORIES OF MINING § 785.13 Experimental practices...
BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs

Directory of Open Access Journals (Sweden)

Tsafnat Guy

2011-04-01

Full Text Available Abstract Background The identification of drug characteristics is a clinically important task, but it requires much expert knowledge and consumes substantial resources. We have developed a statistical text-mining approach (BInary Characteristics Extractor and biomedical Properties Predictor: BICEPP to help experts screen drugs that may have important clinical characteristics of interest. Results BICEPP first retrieves MEDLINE abstracts containing drug names, then selects tokens that best predict the list of drugs which represents the characteristic of interest. Machine learning is then used to classify drugs using a document frequency-based measure. Evaluation experiments were performed to validate BICEPP's performance on 484 characteristics of 857 drugs, identified from the Australian Medicines Handbook (AMH and the PharmacoKinetic Interaction Screening (PKIS database. Stratified cross-validations revealed that BICEPP was able to classify drugs into all 20 major therapeutic classes (100% and 157 (of 197 minor drug classes (80% with areas under the receiver operating characteristic curve (AUC > 0.80. Similarly, AUC > 0.80 could be obtained in the classification of 173 (of 238 adverse events (73%, up to 12 (of 15 groups of clinically significant cytochrome P450 enzyme (CYP inducers or inhibitors (80%, and up to 11 (of 14 groups of narrow therapeutic index drugs (79%. Interestingly, it was observed that the keywords used to describe a drug characteristic were not necessarily the most predictive ones for the classification task. Conclusions BICEPP has sufficient classification power to automatically distinguish a wide range of clinical properties of drugs. This may be used in pharmacovigilance applications to assist with rapid screening of large drug databases to identify important characteristics for further evaluation.
Unsupervised text mining methods for literature analysis: a case study for Thomas Pynchon's V.

Directory of Open Access Journals (Sweden)

Christos Iraklis Tsatsoulis

2013-08-01

Full Text Available We investigate the use of unsupervised text mining methods for the analysis of prose literature works, using Thomas Pynchon's novel 'V'. as a case study. Our results suggest that such methods may be employed to reveal meaningful information regarding the novel’s structure. We report results using a wide variety of clustering algorithms, several distinct distance functions, and different visualization techniques. The application of a simple topic model is also demonstrated. We discuss the meaningfulness of our results along with the limitations of our approach, and we suggest some possible paths for further study.
Text Mining of the Classical Medical Literature for Medicines That Show Potential in Diabetic Nephropathy

Directory of Open Access Journals (Sweden)

Lei Zhang

2014-01-01

Full Text Available Objectives. To apply modern text-mining methods to identify candidate herbs and formulae for the treatment of diabetic nephropathy. Methods. The method we developed includes three steps: (1 identification of candidate ancient terms; (2 systemic search and assessment of medical records written in classical Chinese; (3 preliminary evaluation of the effect and safety of candidates. Results. Ancient terms Xia Xiao, Shen Xiao, and Xiao Shen were determined as the most likely to correspond with diabetic nephropathy and used in text mining. A total of 80 Chinese formulae for treating conditions congruent with diabetic nephropathy recorded in medical books from Tang Dynasty to Qing Dynasty were collected. Sao si tang (also called Reeling Silk Decoction was chosen to show the process of preliminary evaluation of the candidates. It had promising potential for development as new agent for the treatment of diabetic nephropathy. However, further investigations about the safety to patients with renal insufficiency are still needed. Conclusions. The methods developed in this study offer a targeted approach to identifying traditional herbs and/or formulae as candidates for further investigation in the search for new drugs for modern disease. However, more effort is still required to improve our techniques, especially with regard to compound formulae.
Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature.

Science.gov (United States)

Müller, H-M; Van Auken, K M; Li, Y; Sternberg, P W

2018-03-09

The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved. We describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents. Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium. Textpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements
A New Challenge for Information Mining

Directory of Open Access Journals (Sweden)

Roberto Paiano

2017-07-01

Full Text Available In the field of "Data Exploration" many approaches have been developed to solve the problem of management of big data that are also semantically rich. Nowadays, there is a strong need to support the discovery-oriented applications where data discovery is a highly ad hoc interactive process to support the users by assisting the navigation in the data to find interesting objects. In this work starting by a theoretical data exploration system, where we identified the main features that a data exploration system must have to an efficient exploratory experience, we propose a combination of two data exploration techniques faceted navigation and data mining with the aim to improve the discovery information during exploration. This approach is contextualized better in Information Mining. Information mining, in fact, aims at discovering knowledge, i.e. more general patterns within objects or collections of objects.
Integrating Industrial Ecology Thinking into the Management of Mining Waste

Directory of Open Access Journals (Sweden)

Éléonore Lèbre

2015-10-01

Full Text Available Mining legacies are often dominated by large waste facilities and their associated environmental impacts. The most serious environmental problem associated with mine waste is heavy metals and acid leakage through a phenomenon called acid mine drainage (AMD. Interestingly, the toxicity of this leakage is partly due to the presence of valuable metals in the waste stream as a result of a diversity of factors influencing mining operations. A more preventive and recovery-oriented approach to waste management, integrated into mine planning and operations, could be both economically attractive and environmentally beneficial since it would: mitigate environmental impacts related to mine waste disposal (and consequently reduce the remediation costs; and increase the resource recovery at the mine site level. The authors argue that eco-efficiency and resilience (and the resulting increase in a mine’s lifetime are both critical—yet overlooked—characteristics of sustainable mining operations. Based on these arguments, this paper proposes a framework to assist with identification of opportunities for improvement and to measure this improvement in terms of its contribution to a mine’s sustainability performance.
Text Mining Effectively Scores and Ranks the Literature for Improving Chemical-Gene-Disease Curation at the Comparative Toxicogenomics Database

Science.gov (United States)

Johnson, Robin J.; Lay, Jean M.; Lennon-Hopkins, Kelley; Saraceni-Richards, Cynthia; Sciaky, Daniela; Murphy, Cynthia Grondin; Mattingly, Carolyn J.

2013-01-01

The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) is a public resource that curates interactions between environmental chemicals and gene products, and their relationships to diseases, as a means of understanding the effects of environmental chemicals on human health. CTD provides a triad of core information in the form of chemical-gene, chemical-disease, and gene-disease interactions that are manually curated from scientific articles. To increase the efficiency, productivity, and data coverage of manual curation, we have leveraged text mining to help rank and prioritize the triaged literature. Here, we describe our text-mining process that computes and assigns each article a document relevancy score (DRS), wherein a high DRS suggests that an article is more likely to be relevant for curation at CTD. We evaluated our process by first text mining a corpus of 14,904 articles triaged for seven heavy metals (cadmium, cobalt, copper, lead, manganese, mercury, and nickel). Based upon initial analysis, a representative subset corpus of 3,583 articles was then selected from the 14,094 articles and sent to five CTD biocurators for review. The resulting curation of these 3,583 articles was analyzed for a variety of parameters, including article relevancy, novel data content, interaction yield rate, mean average precision, and biological and toxicological interpretability. We show that for all measured parameters, the DRS is an effective indicator for scoring and improving the ranking of literature for the curation of chemical-gene-disease information at CTD. Here, we demonstrate how fully incorporating text mining-based DRS scoring into our curation pipeline enhances manual curation by prioritizing more relevant articles, thereby increasing data content, productivity, and efficiency. PMID:23613709
Assisting IAEA Member States to Strengthen Regulatory Control, Particularly in the Medical Area

International Nuclear Information System (INIS)

Johnston, P.

2016-01-01

As per its Statue and Mandate, IAEA is developing Safety Standards and is also providing assistance for their application in Member States. One target and very large audience of this programme is the community of national regulatory bodies for radiation safety, expected to be established in all 168 Member States. Ionizing radiation is being used throughout the world in medical practices and medical exposure is the most significant manmade source of exposure to the population from ionizing radiation. Radiation accidents involving medical uses have accounted for more injuries and early acute health effects than any other type of radiation accident, including accidents at nuclear facilities. With the constant emerging of new technologies using ionizing radiation for medical diagnostic and treatment, there are on-going challenges for Regulatory bodies. The presentation will highlight some figures related to the medical exposure worldwide, and then it will introduce the main safety standards and other publications developed specifically for Regulatory Bodies and focusing on medical practices. It will also highlight the most important and recent mechanisms (tools, peer reviews and advisory services, training courses, networks) that the Agency is offering to its Member States in order to cope with the main challenges worldwide, contributing thus to the efficiency and effectiveness of the regulatory oversight of medical facilities and activities. (author)
Blasting Standards for the Ghanaian Mining Industry | Amegbey ...

African Journals Online (AJOL)

Ghana is a well known mining nation and hard rock mining has been going on since the 10th century. Mining companies in Ghana are well aware of the regulatory requirements to carry out blasting activities such that neighbouring communities are protected from excessive impact as a result of blast vibrations amongst other ...
Phytostabilization of Mine Tailings Using Compost-Assisted Direct Planting: Translating Greenhouse Results to the Field

OpenAIRE

Gil-Loaiza, Juliana; White, Scott A.; Root, Robert A.; Solís-Dominguez, Fernando A.; Hammond, Corin M.; Chorover, Jon; Maier, Raina M.

2016-01-01

Standard practice in reclamation of mine tailings is the emplacement of a 15 to 90 cm soil/gravel/rock cap which is then hydro-seeded. In this study we investigate compost-assisted direct planting phytostabilization technology as an alternative to standard cap and plant practices. In phytostabilization the goal is to establish a vegetative cap using native plants that stabilize metals in the root zone with little to no shoot accumulation. The study site is a barren 62-hectare tailings pile ch...
Mining Tasks from the Web Anchor Text Graph: MSR Notebook Paper for the TREC 2015 Tasks Track

Science.gov (United States)

2015-11-20

Mining Tasks from the Web Anchor Text Graph: MSR Notebook Paper for the TREC 2015 Tasks Track Paul N. Bennett Microsoft Research Redmond, USA pauben...anchor text graph has proven useful in the general realm of query reformulation [2], we sought to quantify the value of extracting key phrases from...anchor text in the broader setting of the task understanding track. Given a query, our approach considers a simple method for identifying a relevant
Regulating Mining in South Africa and Zimbabwe: Communities, the Environment and Perpetual Exploitation

Directory of Open Access Journals (Sweden)

Tumai Murombo

2013-06-01

Full Text Available Mining as an extractive activity has the potential to promote sustainable economic growth in developing countries; however this largely depends on how the activities are regulated. Mining contributes to environmental pollution and degradation, and the social degeneration of local communities. Corporate social responsibility initiatives are often self-serving short-term programs that in the long term do not benefit mining communities. In this article, the mining, environment and community trilemma is investigated through the lens of what is happening in South Africa and Zimbabwe. It is argued that continued calls for nationalisation and indigenisation are the sequel of the failure of postcolonial mineral law and policy reforms. Regulatory continuity from colonial laws has seen mining companies continue to treat mineral rich developing countries as sources of raw materials. Little is done to develop the communities impacted by mining activities. Recommendations are made on how mining can support sustainable development without creating a cycle of poverty within mining communities. This can happen through effective regulation embedded within sustainable development, transparency and accountability and equitable access to mineral wealth.
Argo: an integrative, interactive, text mining-based workbench supporting curation

Science.gov (United States)

Rak, Rafal; Rowley, Andrew; Black, William; Ananiadou, Sophia

2012-01-01

Curation of biomedical literature is often supported by the automatic analysis of textual content that generally involves a sequence of individual processing components. Text mining (TM) has been used to enhance the process of manual biocuration, but has been focused on specific databases and tasks rather than an environment integrating TM tools into the curation pipeline, catering for a variety of tasks, types of information and applications. Processing components usually come from different sources and often lack interoperability. The well established Unstructured Information Management Architecture is a framework that addresses interoperability by defining common data structures and interfaces. However, most of the efforts are targeted towards software developers and are not suitable for curators, or are otherwise inconvenient to use on a higher level of abstraction. To overcome these issues we introduce Argo, an interoperable, integrative, interactive and collaborative system for text analysis with a convenient graphic user interface to ease the development of processing workflows and boost productivity in labour-intensive manual curation. Robust, scalable text analytics follow a modular approach, adopting component modules for distinct levels of text analysis. The user interface is available entirely through a web browser that saves the user from going through often complicated and platform-dependent installation procedures. Argo comes with a predefined set of processing components commonly used in text analysis, while giving the users the ability to deposit their own components. The system accommodates various areas and levels of user expertise, from TM and computational linguistics to ontology-based curation. One of the key functionalities of Argo is its ability to seamlessly incorporate user-interactive components, such as manual annotation editors, into otherwise completely automatic pipelines. As a use case, we demonstrate the functionality of an in
E-Cigarette Social Media Messages: A Text Mining Analysis of Marketing and Consumer Conversations on Twitter

OpenAIRE

Lazard, Allison J; Saffer, Adam J; Wilcox, Gary B; Chung, Arnold DongWoo; Mackert, Michael S; Bernhardt, Jay M

2016-01-01

Background As the use of electronic cigarettes (e-cigarettes) rises, social media likely influences public awareness and perception of this emerging tobacco product. Objective This study examined the public conversation on Twitter to determine overarching themes and insights for trending topics from commercial and consumer users. Methods Text mining uncovered key patterns and important topics for e-cigarettes on Twitter. SAS Text Miner 12.1 software (SAS Institute Inc) was used for descriptiv...
Text mining for search term development in systematic reviewing: A discussion of some methods and challenges.

Science.gov (United States)

Stansfield, Claire; O'Mara-Eves, Alison; Thomas, James

2017-09-01

Using text mining to aid the development of database search strings for topics described by diverse terminology has potential benefits for systematic reviews; however, methods and tools for accomplishing this are poorly covered in the research methods literature. We briefly review the literature on applications of text mining for search term development for systematic reviewing. We found that the tools can be used in 5 overarching ways: improving the precision of searches; identifying search terms to improve search sensitivity; aiding the translation of search strategies across databases; searching and screening within an integrated system; and developing objectively derived search strategies. Using a case study and selected examples, we then reflect on the utility of certain technologies (term frequency-inverse document frequency and Termine, term frequency, and clustering) in improving the precision and sensitivity of searches. Challenges in using these tools are discussed. The utility of these tools is influenced by the different capabilities of the tools, the way the tools are used, and the text that is analysed. Increased awareness of how the tools perform facilitates the further development of methods for their use in systematic reviews. Copyright © 2017 John Wiley & Sons, Ltd.
Whole field tendencies in transcranial magnetic stimulation: A systematic review with data and text mining.

Science.gov (United States)

Dias, Alvaro Machado; Mansur, Carlos Gustavo; Myczkowski, Martin; Marcolin, Marco

2011-06-01

Transcranial magnetic stimulation (TMS) has played an important role in the fields of psychiatry, neurology and neuroscience, since its emergence in the mid-1980s; and several high quality reviews have been produced since then. Most high quality reviews serve as powerful tools in the evaluation of predefined tendencies, but they cannot actually uncover new trends within the literature. However, special statistical procedures to 'mine' the literature have been developed which aid in achieving such a goal. This paper aims to uncover patterns within the literature on TMS as a whole, as well as specific trends in the recent literature on TMS for the treatment of depression. Data mining and text mining. Currently there are 7299 publications, which can be clustered in four essential themes. Considering the frequency of the core psychiatric concepts within the indexed literature, the main results are: depression is present in 13.5% of the publications; Parkinson's disease in 2.94%; schizophrenia in 2.76%; bipolar disorder in 0.158%; and anxiety disorder in 0.142% of all the publications indexed in PubMed. Several other perspectives are discussed in the article. Copyright © 2011 Elsevier B.V. All rights reserved.
A text-based data mining and toxicity prediction modeling system for a clinical decision support in radiation oncology: A preliminary study

Science.gov (United States)

Kim, Kwang Hyeon; Lee, Suk; Shim, Jang Bo; Chang, Kyung Hwan; Yang, Dae Sik; Yoon, Won Sup; Park, Young Je; Kim, Chul Yong; Cao, Yuan Jie

2017-08-01

The aim of this study is an integrated research for text-based data mining and toxicity prediction modeling system for clinical decision support system based on big data in radiation oncology as a preliminary research. The structured and unstructured data were prepared by treatment plans and the unstructured data were extracted by dose-volume data image pattern recognition of prostate cancer for research articles crawling through the internet. We modeled an artificial neural network to build a predictor model system for toxicity prediction of organs at risk. We used a text-based data mining approach to build the artificial neural network model for bladder and rectum complication predictions. The pattern recognition method was used to mine the unstructured toxicity data for dose-volume at the detection accuracy of 97.9%. The confusion matrix and training model of the neural network were achieved with 50 modeled plans (n = 50) for validation. The toxicity level was analyzed and the risk factors for 25% bladder, 50% bladder, 20% rectum, and 50% rectum were calculated by the artificial neural network algorithm. As a result, 32 plans could cause complication but 18 plans were designed as non-complication among 50 modeled plans. We integrated data mining and a toxicity modeling method for toxicity prediction using prostate cancer cases. It is shown that a preprocessing analysis using text-based data mining and prediction modeling can be expanded to personalized patient treatment decision support based on big data.
GIS-assisted spatial analysis for urban regulatory detailed planning: designer's dimension in the Chinese code system

Science.gov (United States)

Yu, Yang; Zeng, Zheng

2009-10-01

By discussing the causes behind the high amendments ratio in the implementation of urban regulatory detailed plans in China despite its law-ensured status, the study aims to reconcile conflict between the legal authority of regulatory detailed planning and the insufficient scientific support in its decision-making and compilation by introducing into the process spatial analysis based on GIS technology and 3D modeling thus present a more scientific and flexible approach to regulatory detailed planning in China. The study first points out that the current compilation process of urban regulatory detailed plan in China employs mainly an empirical approach which renders it constantly subjected to amendments; the study then discusses the need and current utilization of GIS in the Chinese system and proposes the framework of a GIS-assisted 3D spatial analysis process from the designer's perspective which can be regarded as an alternating processes between the descriptive codes and physical design in the compilation of regulatory detailed planning. With a case study of the processes and results from the application of the framework, the paper concludes that the proposed framework can be an effective instrument which provides more rationality, flexibility and thus more efficiency to the compilation and decision-making process of urban regulatory detailed plan in China.

Analysing Customer Opinions with Text Mining Algorithms

Science.gov (United States)

Consoli, Domenico

2009-08-01

Knowing what the customer thinks of a particular product/service helps top management to introduce improvements in processes and products, thus differentiating the company from their competitors and gain competitive advantages. The customers, with their preferences, determine the success or failure of a company. In order to know opinions of the customers we can use technologies available from the web 2.0 (blog, wiki, forums, chat, social networking, social commerce). From these web sites, useful information must be extracted, for strategic purposes, using techniques of sentiment analysis or opinion mining.
Regulatory Oversight of the Legacy Gunner Uranium Mine and Mill Site in Northern Saskatchewan, Canada - 13434

Energy Technology Data Exchange (ETDEWEB)

Stenson, Ron; Howard, Don [Canadian Nuclear Safety Commission, P.O. Box 1046, Station B, 280 Slater Street, Ottawa ON K1P 5S9 (Canada)

2013-07-01

As Canada's nuclear regulator, the Canadian Nuclear Safety Commission (CNSC) is responsible for licensing all aspects of uranium mining, including remediation activities at legacy sites. Since these sites already existed when the current legislation came into force in 2000, and the previous legislation did not apply, they present a special case. The Nuclear Safety and Control Act (NSCA), was written with cradle-to- grave oversight in mind. Applying the NSCA at the end of a 'facilities' life-cycle poses some challenges to both the regulator and the proponent. When the proponent is the public sector, even more challenges can present themselves. Although the licensing process for legacy sites is no different than for any other CNSC license, assuring regulatory compliance can be more complicated. To demonstrate how the CNSC has approached the oversight of legacy sites the history of the Commission's involvement with the Gunnar uranium mine and mill site provides a good case study. The lessons learned from the CNSC's experience regulating the Gunnar site will benefit those in the future who will need to regulate legacy sites under existing or new legislation. (authors)
Occupational health and safety in underground mines

International Nuclear Information System (INIS)

Martinson, M.J.

1976-01-01

An historical review of the health hazards associated with the inhalation of airborne radionuclides in uraniummines is given. A set of regulations regarding radiation standards for uranium mining was approved by the American President in 1967. Since then the hazard of uranium mining has been subjected to searching public enquiry at Congressional Hearings and been the subject of an unprecedented spate of regulatory standards. Design criteria for mine ventilation are described
Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

Science.gov (United States)

Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

2000-01-01

These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)
Mine Water Treatment in Hongai Coal Mines

Directory of Open Access Journals (Sweden)

Dang Phuong Thao

2018-01-01

Full Text Available Acid mine drainage (AMD is recognized as one of the most serious environmental problem associated with mining industry. Acid water, also known as acid mine drainage forms when iron sulfide minerals found in the rock of coal seams are exposed to oxidizing conditions in coal mining. Until 2009, mine drainage in Hongai coal mines was not treated, leading to harmful effects on humans, animals and aquatic ecosystem. This report has examined acid mine drainage problem and techniques for acid mine drainage treatment in Hongai coal mines. In addition, selection and criteria for the design of the treatment systems have been presented.
The Determination of Children's Knowledge of Global Lunar Patterns from Online Essays Using Text Mining Analysis

Science.gov (United States)

Cheon, Jongpil; Lee, Sangno; Smith, Walter; Song, Jaeki; Kim, Yongjin

2013-01-01

The purpose of this study was to use text mining analysis of early adolescents' online essays to determine their knowledge of global lunar patterns. Australian and American students in grades five to seven wrote about global lunar patterns they had discovered by sharing observations with each other via the Internet. These essays were analyzed for…
Effects of Computer-Assisted Instruction with Conceptual Change Texts on Removing the Misconceptions of Radioactivity

Directory of Open Access Journals (Sweden)

Ahmet YUMUŞAK

2016-12-01

Full Text Available Training young scientists, enabling conceptual understanding in science education is quite important. Misconception is one of the important indications for whether the concepts are understood or not. The most important educational tools to remove misconceptions are conceptual change texts. In addition, one of the important methods to remove misconceptions is computer-assisted instruction. The goal of this study is to research the effects of the use of computer-assisted instruction (CAI, conceptual change texts (CCT, computer-assisted instruction with conceptual change texts (CAI+CCT, and use of traditional teaching method (TTM on removing the misconceptions of science teacher candidates on the subject of radioactivity. Research sample was made of totally 92 students studying at four different groups of senior students in Celal Bayar University, Faculty of Education, Department of Science Education in 2011-2012 academic year. A different teaching method was used in each group. Experimental groups were randomly determined; in the first experimental group, computer-assisted instruction was used (23 students; in the second experimental group, conceptual change texts were used (23 students; in the third experimental group, computer-assisted instruction with conceptual change texts were used (23 students; and the fourth group, on which traditional education method was used, was called control group (23 students. Two-tier misconception diagnostic instrument, which was developed by the researcher, was used as data collection tool of the research. “Nonequivalent Control Groups Experimental Design” was used in this research in order to determine the efficiency of different teaching methods. Obtained data were analyzed by using SPSS 21.0. As a result of the research, it was determined that methods used on experimental groups were more successful than traditional teaching method practiced on control group in terms of removing misconceptions on
Practice-based evidence: profiling the safety of cilostazol by text-mining of clinical notes.

Directory of Open Access Journals (Sweden)

Nicholas J Leeper

Full Text Available Peripheral arterial disease (PAD is a growing problem with few available therapies. Cilostazol is the only FDA-approved medication with a class I indication for intermittent claudication, but carries a black box warning due to concerns for increased cardiovascular mortality. To assess the validity of this black box warning, we employed a novel text-analytics pipeline to quantify the adverse events associated with Cilostazol use in a clinical setting, including patients with congestive heart failure (CHF.We analyzed the electronic medical records of 1.8 million subjects from the Stanford clinical data warehouse spanning 18 years using a novel text-mining/statistical analytics pipeline. We identified 232 PAD patients taking Cilostazol and created a control group of 1,160 PAD patients not taking this drug using 1:5 propensity-score matching. Over a mean follow up of 4.2 years, we observed no association between Cilostazol use and any major adverse cardiovascular event including stroke (OR = 1.13, CI [0.82, 1.55], myocardial infarction (OR = 1.00, CI [0.71, 1.39], or death (OR = 0.86, CI [0.63, 1.18]. Cilostazol was not associated with an increase in any arrhythmic complication. We also identified a subset of CHF patients who were prescribed Cilostazol despite its black box warning, and found that it did not increase mortality in this high-risk group of patients.This proof of principle study shows the potential of text-analytics to mine clinical data warehouses to uncover 'natural experiments' such as the use of Cilostazol in CHF patients. We envision this method will have broad applications for examining difficult to test clinical hypotheses and to aid in post-marketing drug safety surveillance. Moreover, our observations argue for a prospective study to examine the validity of a drug safety warning that may be unnecessarily limiting the use of an efficacious therapy.
Challenges of development of regulatory control infrastructure for uranium mining in developing countries (Tanzania) to achieve regulatory compliance

International Nuclear Information System (INIS)

Kileo, A.; Mwalongo, D.; Mkilaha, I.; Mwaipopo, A.

2014-01-01

Managing radiation and waste in uranium mining is of paramount importance for the protection of occupational workers, the public and the environment. Responsibilities of the parties which are involved in the part of the Nuclear Fuel Cycle are outlined in the legislations and regulations governing uranium prospecting, mining and processing. The Tanzania Atomic Energy Commission, as the regulator for radiation and atomic energy, has developed regulations for exploration, construction, mining, milling, packaging, transport of yellow cake and decommissioning of uranium mine site in Tanzania. This paper outlines the development of these regulations and compares with the international standards. The paper also reviews and analyses gaps and shortcomings for safe uranium mining in United Republic of Tanzania. (author)
Assertions of Japanese Websites for and Against Cancer Screening: a Text Mining Analysis

Science.gov (United States)

Okuhara, Tsuyoshi; Ishikawa, Hirono; Okada, Masahumi; Kato, Mio; Kiuchi, Takahiro

2017-04-01

Background: Cancer screening rates are lower in Japan than in Western countries such as the United States and the United Kingdom. While health professionals publish pro-cancer-screening messages online to encourage proactive seeking for screening, anti-screening activists use the same medium to warn readers against following guidelines. Contents of pro- and anti-cancer-screening sites may contribute to readers’ acceptance of one or the other position. We aimed to use a text-mining method to examine frequently appearing contents on sites for and against cancer screening. Methods: We conducted online searches in December 2016 using two major search engines in Japan (Google Japan and Yahoo! Japan). Targeted websites were classified as “pro”, “anti”, or “neutral” depending on their claims, with the author(s) classified as “health professional”, “mass media”, or “layperson”. Text-mining analyses were conducted, and statistical analysis was performed using the chi-square test. Results: Of the 169 websites analyzed, the top-three most frequently appearing content topics in pro sites were reducing mortality via cancer screening, benefits of early detection, and recommendations for obtaining detailed examination. The top three most frequent in anti-sites were harm from radiation exposure, non-efficacy of cancer screening, and lack of necessity of early detection. Anti-sites also frequently referred to a well-known Japanese radiologist, Makoto Kondo, who rejects the standard forms of cancer care. Conclusion: Our findings should enable authors of pro-cancer-screening sites to write to counter misleading anti-cancer-screening messages and facilitate dissemination of accurate information. Creative Commons Attribution License
Law 19.126. It dictate Regulatory standards about Mining of great bearing

International Nuclear Information System (INIS)

2013-01-01

It statute rules for regulating mining projects of great size, ownership, location, related mining activities, mine closure plan, exploitation concession contract, taxation regime, canon, infractions and sanctions
Translation Memory and Computer Assisted Translation Tool for Medieval Texts

Directory of Open Access Journals (Sweden)

Törcsvári Attila

2013-05-01

Full Text Available Translation memories (TMs, as part of Computer Assisted Translation (CAT tools, support translators reusing portions of formerly translated text. Fencing books are good candidates for using TMs due to the high number of repeated terms. Medieval texts suffer a number of drawbacks that make hard even “simple” rewording to the modern version of the same language. The analyzed difficulties are: lack of systematic spelling, unusual word orders and typos in the original. A hypothesis is made and verified that even simple modernization increases legibility and it is feasible, also it is worthwhile to apply translation memories due to the numerous and even extremely long repeated terms. Therefore, methods and algorithms are presented 1. for automated transcription of medieval texts (when a limited training set is available, and 2. collection of repeated patterns. The efficiency of the algorithms is analyzed for recall and precision.
Automated Assessment of Patients' Self-Narratives for Posttraumatic Stress Disorder Screening Using Natural Language Processing and Text Mining.

Science.gov (United States)

He, Qiwei; Veldkamp, Bernard P; Glas, Cees A W; de Vries, Theo

2017-03-01

Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.
Data preparation for municipal virtual assistant using machine learning

OpenAIRE

Jovan, Leon Noe

2016-01-01

The main goal of this master’s thesis was to develop a procedure that will automate the construction of the knowledge base for a virtual assistant that answers questions about municipalities in Slovenia. The aim of the procedure is to replace or facilitate manual preparation of the virtual assistant's knowledge base. Theoretical backgrounds of different machine learning fields, such as multilabel classification, text mining and learning from weakly labeled data were examined to gain a better ...
Automated assessment of patients' self-narratives for posttraumatic stress disorder screening using natural language processing and text mining

NARCIS (Netherlands)

He, Qiwei; Veldkamp, Bernard P.; Glas, Cornelis A.W.; de Vries, Theo

2017-01-01

Patients’ narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four
DDMGD: the database of text-mined associations between genes methylated in diseases from different species

KAUST Repository

Raies, A. B.

2014-11-14

Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD\\'s scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases.
Restoring Forests and Associated Ecosystem Services on Appalachian Coal Surface Mines

Science.gov (United States)

Zipper, Carl E.; Burger, James A.; Skousen, Jeffrey G.; Angel, Patrick N.; Barton, Christopher D.; Davis, Victor; Franklin, Jennifer A.

2011-05-01

Surface coal mining in Appalachia has caused extensive replacement of forest with non-forested land cover, much of which is unmanaged and unproductive. Although forested ecosystems are valued by society for both marketable products and ecosystem services, forests have not been restored on most Appalachian mined lands because traditional reclamation practices, encouraged by regulatory policies, created conditions poorly suited for reforestation. Reclamation scientists have studied productive forests growing on older mine sites, established forest vegetation experimentally on recent mines, and identified mine reclamation practices that encourage forest vegetation re-establishment. Based on these findings, they developed a Forestry Reclamation Approach (FRA) that can be employed by coal mining firms to restore forest vegetation. Scientists and mine regulators, working collaboratively, have communicated the FRA to the coal industry and to regulatory enforcement personnel. Today, the FRA is used routinely by many coal mining firms, and thousands of mined hectares have been reclaimed to restore productive mine soils and planted with native forest trees. Reclamation of coal mines using the FRA is expected to restore these lands' capabilities to provide forest-based ecosystem services, such as wood production, atmospheric carbon sequestration, wildlife habitat, watershed protection, and water quality protection to a greater extent than conventional reclamation practices.
A Text Matching Method to Facilitate the Validation of Frequent Order Sets Obtained Through Data Mining

OpenAIRE

Che, Chengjian; Rocha, Roberto A.

2006-01-01

In order to compare order sets discovered using a data mining algorithm with existing order sets, we developed an order matching tool based on Oracle Text. The tool includes both automated searching and manual review processes. The comparison between the automated process and the manual review process indicates that the sensitivity of the automated matching is 81% and the specificity is 84%.
Text mining with emergent self organizing maps and multi-dimensional scaling: a comparative study on domestic violence

NARCIS (Netherlands)

Poelmans, J.; van Hulle, M.M.; Viaene, S.; Elzinga, P.; Dedene, G.

2011-01-01

In this paper we compare the usability of ESOM and MDS as text exploration instruments in police investigations. We combine them with traditional classification instruments such as the SVM and Naïve Bayes. We perform a case of real-life data mining using a dataset consisting of police reports
Mining and sustainable development: environmental policies and programmes of mining industry associations

International Nuclear Information System (INIS)

Miller, C.G.

1997-01-01

Mining industry policies and practices have evolved rapidly in the environmental area, and more recently in the social area as well. Mining industry associations are using a variety of methods to stimulate and assist their member companies as they improve their environmental, social and economic performance. These associations provide opportunities for companies to use collaborative approaches in developing and applying improved technology, systems and practices (author)

PubMed-EX: a web browser extension to enhance PubMed search with text mining features.

Science.gov (United States)

Tsai, Richard Tzong-Han; Dai, Hong-Jie; Lai, Po-Ting; Huang, Chi-Hsin

2009-11-15

PubMed-EX is a browser extension that marks up PubMed search results with additional text-mining information. PubMed-EX's page mark-up, which includes section categorization and gene/disease and relation mark-up, can help researchers to quickly focus on key terms and provide additional information on them. All text processing is performed server-side, freeing up user resources. PubMed-EX is freely available at http://bws.iis.sinica.edu.tw/PubMed-EX and http://iisr.cse.yzu.edu.tw:8000/PubMed-EX/.
Texts and data mining and their possibilities applied to the process of news production

Directory of Open Access Journals (Sweden)

Walter Teixeira Lima Jr

2008-06-01

Full Text Available The proposal of this essay is to discuss the challenges of representing in a formalist computational process the knowledge which the journalist uses to articulate news values for the purpose of selecting and imposing hierarchy on news. It discusses how to make bridges to emulate this knowledge obtained in an empirical form with the bases of computational science, in the area of storage, recovery and linked to data in a database, which must show the way human brains treat information obtained through their sensorial system. Systemizing and automating part of the journalistic process in a database contributes to eliminating distortions, faults and to applying, in an efficient manner, techniques for Data Mining and/or Texts which, by definition, permit the discovery of nontrivial relations.
Texts and data mining and their possibilities applied to the process of news production

Directory of Open Access Journals (Sweden)

Walter Teixeira Lima Jr

2011-02-01

Full Text Available The proposal of this essay is to discuss the challenges of representing in a formalist computational process the knowledge which the journalist uses to articulate news values for the purpose of selecting and imposing hierarchy on news. It discusses how to make bridges to emulate this knowledge obtained in an empirical form with the bases of computational science, in the area of storage, recovery and linked to data in a database, which must show the way human brains treat information obtained through their sensorial system. Systemizing and automating part of the journalistic process in a database contributes to eliminating distortions, faults and to applying, in an efficient manner, techniques for Data Mining and/or Texts which, by definition, permit the discovery of nontrivial relations.
Interactive text mining with Pipeline Pilot: a bibliographic web-based tool for PubMed.

Science.gov (United States)

Vellay, S G P; Latimer, N E Miller; Paillard, G

2009-06-01

Text mining has become an integral part of all research in the medical field. Many text analysis software platforms support particular use cases and only those. We show an example of a bibliographic tool that can be used to support virtually any use case in an agile manner. Here we focus on a Pipeline Pilot web-based application that interactively analyzes and reports on PubMed search results. This will be of interest to any scientist to help identify the most relevant papers in a topical area more quickly and to evaluate the results of query refinement. Links with Entrez databases help both the biologist and the chemist alike. We illustrate this application with Leishmaniasis, a neglected tropical disease, as a case study.
Mining-Related Selenium Contamination in Alaska, and the State of Current Knowledge

Directory of Open Access Journals (Sweden)

Aibyek Khamkhash

2017-03-01

Full Text Available Selenium pollution has been a topic of extensive research dating back further than the last decade and has attracted significant attention from several environmental and regulatory agencies in order to monitor and control its discharge from myriad industrial sources. The mining industry is a prime contributor of hazardous selenium release in the aquatic systems and is responsible for both acute and chronic impacts on living organisms. Herein we provide an overview of selenium contamination issues, with a specific focus on selenium release from mining industries, including a discussion of various technologies commonly employed to treat selenium-impacted waters from mining discharge. Different cases pertaining to selenium release from Alaskan mines (during years 2000–2015 are also presented, along with measures taken to mitigate high concentration releases. For continued resource exploration and economic development activities, as well as environmental preservation, it is important to fundamentally understand such emerging and pressing issues as selenium contamination and investigate efficient technological approaches to counter these challenges.
The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews.

Science.gov (United States)

Hao, Haijing; Zhang, Kunpeng

2016-05-10

Many Web-based health care platforms allow patients to evaluate physicians by posting open-end textual reviews based on their experiences. These reviews are helpful resources for other patients to choose high-quality doctors, especially in countries like China where no doctor referral systems exist. Analyzing such a large amount of user-generated content to understand the voice of health consumers has attracted much attention from health care providers and health care researchers. The aim of this paper is to automatically extract hidden topics from Web-based physician reviews using text-mining techniques to examine what Chinese patients have said about their doctors and whether these topics differ across various specialties. This knowledge will help health care consumers, providers, and researchers better understand this information. We conducted two-fold analyses on the data collected from the "Good Doctor Online" platform, the largest online health community in China. First, we explored all reviews from 2006-2014 using descriptive statistics. Second, we applied the well-known topic extraction algorithm Latent Dirichlet Allocation to more than 500,000 textual reviews from over 75,000 Chinese doctors across four major specialty areas to understand what Chinese health consumers said online about their doctor visits. On the "Good Doctor Online" platform, 112,873 out of 314,624 doctors had been reviewed at least once by April 11, 2014. Among the 772,979 textual reviews, we chose to focus on four major specialty areas that received the most reviews: Internal Medicine, Surgery, Obstetrics/Gynecology and Pediatrics, and Chinese Traditional Medicine. Among the doctors who received reviews from those four medical specialties, two-thirds of them received more than two reviews and in a few extreme cases, some doctors received more than 500 reviews. Across the four major areas, the most popular topics reviewers found were the experience of finding doctors, doctors' technical
30 CFR 906.10 - State regulatory program approval.

Science.gov (United States)

2010-07-01

... 30 Mineral Resources 3 2010-07-01 2010-07-01 false State regulatory program approval. 906.10 Section 906.10 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE COLORADO § 906.10 State...
Modelling of radon control and air cleaning requirements in underground uranium mines

International Nuclear Information System (INIS)

El Fawal, M.; Gadalla, A.

2014-01-01

As a part of a comprehensive study concerned with control workplace short-lived radon daughter concentration in underground uranium mines to safe levels, a computer program has been developed and verified, to calculate ventilation parameters e.g. local pressures, flow rates and radon daughter concentration levels. The computer program is composed of two parts, one part for mine ventilation and the other part for radon daughter levels calculations. This program has been validated in an actual case study to calculate radon concentration levels, pressure and flow rates required to maintain acceptable levels of radon concentrations in each point of the mine. The required fan static pressure and the approximate energy consumption were also estimated. The results of the calculations have been evaluated and compared with similar investigation. It was found that the calculated values are in good agreement with the corresponding values obtained using ''REDES'' standard ventilation modelling software. The developed computer model can be used as an available tool to help in the evaluation of ventilation systems proposed by mining authority, to assist the uranium mining industry in maintaining the health and safety of the workers underground while efficiently achieving economic production targets. It could be used also for regulatory inspection and radiation protection assessments of workers in the underground mining. Also with using this model, one can effectively design, assess and manage underground mine ventilation systems. Values of radon decay products concentration in units of working level, pressures drop and flow rates required to reach the acceptable radon concentration relative to the recommended levels, at different extraction points in the mine and fan static pressure could be estimated which are not available using other software. (author)
Text mining factor analysis (TFA) in green tea patent data

Science.gov (United States)

Rahmawati, Sela; Suprijadi, Jadi; Zulhanif

2017-03-01

Factor analysis has become one of the most widely used multivariate statistical procedures in applied research endeavors across a multitude of domains. There are two main types of analyses based on factor analysis: Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA). Both EFA and CFA aim to observed relationships among a group of indicators with a latent variable, but they differ fundamentally, a priori and restrictions made to the factor model. This method will be applied to patent data technology sector green tea to determine the development technology of green tea in the world. Patent analysis is useful in identifying the future technological trends in a specific field of technology. Database patent are obtained from agency European Patent Organization (EPO). In this paper, CFA model will be applied to the nominal data, which obtain from the presence absence matrix. While doing processing, analysis CFA for nominal data analysis was based on Tetrachoric matrix. Meanwhile, EFA model will be applied on a title from sector technology dominant. Title will be pre-processing first using text mining analysis.
The regulation of uranium mining in the Northern Territory

International Nuclear Information System (INIS)

Wedd, M.

1989-01-01

The regulatory framework developed for uranium mining operations in the Northern Territory is reviewed. The respective roles of the Commonwealth Government, State Government and other regulatory authority are described. Whilst complex, expensive and cumbersome the regulatory process has so far ensured input from diverse interest groups and it allowed for environmental protection control in the Alligator River Region
INTEGRATED REMOTE SENSING AND GIS FOR MODELING ECONOMIC REHABILITATION DEVELOPMENT OF EX-MINE SITES

Directory of Open Access Journals (Sweden)

Junaidi Junaidi

2017-11-01

Full Text Available The potential environmental impacts of mining, increasing environmental legislation and public awareness have received increased attention world-wide in the last two decades. The focus of concern by the industry, environmental regulatory agencies and members of the public is the systematic rehabilitation of ex-mine sites to improve the quality at site for potential future commercial land use. The minerals extracted from these mine/quarry sites are essential in the construction, semiconductor, high-technology, ceramic and other manufacturing sectors for further industrial development. However, efficient engineering design and systematic economic evaluation of mine sites for site rehabilitation are required in maintaining the expected standards of environmental compliance. With escalating production costs and the keen competitiveness of the mining industry world-wide, the necessity to increase the efficiency in site rehabilitation is getting more prominence. A coordinated environmental protection and rehabilitation programme is essential if the environmental awareness of the community and the demands of the respective planning authorities are to be accommodated. There is thus a need to increase the base of knowledge for efficient planning in the systematic and progressive rehabilitation of current and future ex-mine sites. An efficient modeling tool is required for the systematic planning and design of potential economic land development of ex-mine sites. The applicability of Remote Sensing and Geographic Information System (GIS technology is a useful tool to acquire spatial information for the systematic design and planning of potential development of ex-mine sites. This research was conducted to detect the trends in the suitability of land cover changes via land cover change detection of ex-mine sites and validated with reality. The findings are useful to assist in the development of a tool for efficient modeling and design of potential
Event-based text mining for biology and functional genomics

Science.gov (United States)

Thompson, Paul; Nawaz, Raheel; McNaught, John; Kell, Douglas B.

2015-01-01

The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research. PMID:24907365
Coal mine subsidence

International Nuclear Information System (INIS)

Rahall, N.J.

1991-05-01

This paper examines the efficacy of the Department of the Interior's Office of Surface Mining Reclamation and Enforcement's (OSMRE) efforts to implement the federally assisted coal mine subsidence insurance program. Coal mine subsidence, a gradual settling of the earth's surface above an underground mine, can damage nearby land and property. To help protect property owners from subsidence-related damage, the Congress passed legislation in 1984 authorizing OSMRE to make grants of up to $3 million to each state to help the states establish self-sustaining, state-administered insurance programs. Of the 21 eligible states, six Colorado, Indiana, Kentucky, Ohio, West Virginia, and Wyoming applied for grants. This paper reviews the efforts of these six states to develop self-sustaining insurance programs and assessed OSMRE's oversight of those efforts
Practice-based evidence: profiling the safety of cilostazol by text-mining of clinical notes.

Science.gov (United States)

Leeper, Nicholas J; Bauer-Mehren, Anna; Iyer, Srinivasan V; Lependu, Paea; Olson, Cliff; Shah, Nigam H

2013-01-01

Peripheral arterial disease (PAD) is a growing problem with few available therapies. Cilostazol is the only FDA-approved medication with a class I indication for intermittent claudication, but carries a black box warning due to concerns for increased cardiovascular mortality. To assess the validity of this black box warning, we employed a novel text-analytics pipeline to quantify the adverse events associated with Cilostazol use in a clinical setting, including patients with congestive heart failure (CHF). We analyzed the electronic medical records of 1.8 million subjects from the Stanford clinical data warehouse spanning 18 years using a novel text-mining/statistical analytics pipeline. We identified 232 PAD patients taking Cilostazol and created a control group of 1,160 PAD patients not taking this drug using 1:5 propensity-score matching. Over a mean follow up of 4.2 years, we observed no association between Cilostazol use and any major adverse cardiovascular event including stroke (OR = 1.13, CI [0.82, 1.55]), myocardial infarction (OR = 1.00, CI [0.71, 1.39]), or death (OR = 0.86, CI [0.63, 1.18]). Cilostazol was not associated with an increase in any arrhythmic complication. We also identified a subset of CHF patients who were prescribed Cilostazol despite its black box warning, and found that it did not increase mortality in this high-risk group of patients. This proof of principle study shows the potential of text-analytics to mine clinical data warehouses to uncover 'natural experiments' such as the use of Cilostazol in CHF patients. We envision this method will have broad applications for examining difficult to test clinical hypotheses and to aid in post-marketing drug safety surveillance. Moreover, our observations argue for a prospective study to examine the validity of a drug safety warning that may be unnecessarily limiting the use of an efficacious therapy.
MINING OPERATIONS'' SAFETY PROVISION - FUNDAMENTAL AND APPLIED SCIENCE TASK

Directory of Open Access Journals (Sweden)

Zakharov V.N.

2017-12-01

Full Text Available The stages of the modern Russian scientiic school of comprehensive exploitation of mineral resourcesformation, the main directions of which were concentrated in the Institute of Comprehensive Exploitation of Mineral Resources are considered. The main directions of ICEMR scientiic activity and the most important results of fundamental and applied research are presented, which are the scientiic Ьasis of modern research related to the safety of mineral reserves use provision. The importance of studying the coal and methane interaction, gas dynamic phenomena in coal mines, coal seam degassing technologies and mine methane utilization, mathematical modeling and solving proЬlems in the ield of stressed-deformed state, strength, fracturing mechanics, thermal conductivity, hydromechanics, forced viЬration, etc. are outlined.The effectiveness analysis of the state, academic and industrial Ьranch scientiic centers, university science, design organizations and mining companies joint efforts to reduce industrial injuries in the mining sector of the Russian economy is conducted. The need for targeted measures to move to new technical-technological and regulatory levels of mining, allowing to prevent the accidents with massive fatal injuries, was determined. The solution of these tasks is possiЬle only Ьy comЬining the efforts of the specialized institutes of the Russian Academy of Sciences, of the Ьranch science, of universities and mining companies through the implementation of the "Mining Safety" Scientiic Research Comprehensive Plan, coordinated Ьy ICEMR RAS.
Nuclear Regulatory legislation

International Nuclear Information System (INIS)

1984-06-01

This compilation of statutes and material pertaining to nuclear regulatory legislation through the 97th Congress, 2nd Session, has been prepared by the Office of the Executive Legal Director, U.S. Nuclear Regulatory Commission, with the assistance of staff, for use as an internal resource document
Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

NARCIS (Netherlands)

K.M. Hettne (Kristina); J. Boorsma (Jeffrey); D.A.M. van Dartel (Dorien A M); J.J. Goeman (Jelle); E.C. de Jong (Esther); A.H. Piersma (Aldert); R.H. Stierum (Rob); J. Kleinjans (Jos); J.A. Kors (Jan)

2013-01-01

textabstractBackground: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with
Text processing for technical reports (direct computer-assisted origination, editing, and output of text)

Energy Technology Data Exchange (ETDEWEB)

De Volpi, A.; Fenrick, M. R.; Stanford, G. S.; Fink, C. L.; Rhodes, E. A.

1980-10-01

Documentation often is a primary residual of research and development. Because of this important role and because of the large amount of time consumed in generating technical reports, particularly those containing formulas and graphics, an existing data-processing computer system has been adapted so as to provide text-processing of technical documents. Emphasis has been on accuracy, turnaround time, and time savings for staff and secretaries, for the types of reports normally produced in the reactor development program. The computer-assisted text-processing system, called TXT, has been implemented to benefit primarily the originator of technical reports. The system is of particular value to professional staff, such as scientists and engineers, who have responsibility for generating much correspondence or lengthy, complex reports or manuscripts - especially if prompt turnaround and high accuracy are required. It can produce text that contains special Greek or mathematical symbols. Written in FORTRAN and MACRO, the program TXT operates on a PDP-11 minicomputer under the RSX-11M multitask multiuser monitor. Peripheral hardware includes videoterminals, electrostatic printers, and magnetic disks. Either data- or word-processing tasks may be performed at the terminals. The repertoire of operations has been restricted so as to minimize user training and memory burden. Spectarial staff may be readily trained to make corrections from annotated copy. Some examples of camera-ready copy are provided.
Measurement of quantitative species diversity on reclaimed coal mine lands: A brief overview of the Wyoming regulatory proposal

International Nuclear Information System (INIS)

Vincent, R.B.

1998-01-01

The Wyoming Land Quality Division (LQD) Coal Rules and Regulations require mine operators to specify quantitative procedures for evaluating postmining species diversity and composition. Currently, permit commitments range from deferring to commit to a quantitative procedure until some future date to applying various similarity/diversity indices for comparison of reclaimed lands to native vegetation communities. Therefore, the LQD began trying to develop a standardized procedure to evaluate species diversity and composition, while providing operator flexibility. Review of several technical publications on the use of similarity and diversity indices, and other measurement techniques indicate that a consensus has not been reached on which procedure is most appropriate for use on reclaimed mine lands. In addition, implementation of many of the recommended procedures are not practical with regards to staff and data limitations. As a result, the LQD has developed an interim procedure, based on site-specific baseline data, to evaluate postmining species diversity and composition success with respect to bond release requests. This paper reviews many of the recommended procedures, outlines some of the pros and cons, and provides a specific example of how the proposed interim procedure was applied to an actual coal mine permit. Implementation of this or a similar procedure would allow for site-specific standardization of permits and regulatory requirements, thus reducing review time and reducing some of the subjectivity surrounding a component of the Wyoming bond release requirements
International assistance. Licensing assistance project

International Nuclear Information System (INIS)

Aleev, A.

1999-01-01

Description of licensing assistance project for VATESI is presented. In licensing of unit No.1 of INPP VATESI is supported by many western countries. Experts from regulatory bodies or scientific organizations of those countries assist VATESI staff in reviewing documentation presented by INPP. Among bilateral cooperation support is provided by European Commission through Phare programme

miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.

Science.gov (United States)

Gupta, Samir; Ross, Karen E; Tudor, Catalina O; Wu, Cathy H; Schmidt, Carl J; Vijay-Shanker, K

2016-04-29

MicroRNAs are increasingly being appreciated as critical players in human diseases, and questions concerning the role of microRNAs arise in many areas of biomedical research. There are several manually curated databases of microRNA-disease associations gathered from the biomedical literature; however, it is difficult for curators of these databases to keep up with the explosion of publications in the microRNA-disease field. Moreover, automated literature mining tools that assist manual curation of microRNA-disease associations currently capture only one microRNA property (expression) in the context of one disease (cancer). Thus, there is a clear need to develop more sophisticated automated literature mining tools that capture a variety of microRNA properties and relations in the context of multiple diseases to provide researchers with fast access to the most recent published information and to streamline and accelerate manual curation. We have developed miRiaD (microRNAs in association with Disease), a text-mining tool that automatically extracts associations between microRNAs and diseases from the literature. These associations are often not directly linked, and the intermediate relations are often highly informative for the biomedical researcher. Thus, miRiaD extracts the miR-disease pairs together with an explanation for their association. We also developed a procedure that assigns scores to sentences, marking their informativeness, based on the microRNA-disease relation observed within the sentence. miRiaD was applied to the entire Medline corpus, identifying 8301 PMIDs with miR-disease associations. These abstracts and the miR-disease associations are available for browsing at http://biotm.cis.udel.edu/miRiaD . We evaluated the recall and precision of miRiaD with respect to information of high interest to public microRNA-disease database curators (expression and target gene associations), obtaining a recall of 88.46-90.78. When we expanded the evaluation to
Development and testing of a text-mining approach to analyse patients' comments on their experiences of colorectal cancer care.

Science.gov (United States)

Wagland, Richard; Recio-Saucedo, Alejandra; Simon, Michael; Bracher, Michael; Hunt, Katherine; Foster, Claire; Downing, Amy; Glaser, Adam; Corner, Jessica

2016-08-01

Quality of cancer care may greatly impact on patients' health-related quality of life (HRQoL). Free-text responses to patient-reported outcome measures (PROMs) provide rich data but analysis is time and resource-intensive. This study developed and tested a learning-based text-mining approach to facilitate analysis of patients' experiences of care and develop an explanatory model illustrating impact on HRQoL. Respondents to a population-based survey of colorectal cancer survivors provided free-text comments regarding their experience of living with and beyond cancer. An existing coding framework was tested and adapted, which informed learning-based text mining of the data. Machine-learning algorithms were trained to identify comments relating to patients' specific experiences of service quality, which were verified by manual qualitative analysis. Comparisons between coded retrieved comments and a HRQoL measure (EQ5D) were explored. The survey response rate was 63.3% (21 802/34 467), of which 25.8% (n=5634) participants provided free-text comments. Of retrieved comments on experiences of care (n=1688), over half (n=1045, 62%) described positive care experiences. Most negative experiences concerned a lack of post-treatment care (n=191, 11% of retrieved comments) and insufficient information concerning self-management strategies (n=135, 8%) or treatment side effects (n=160, 9%). Associations existed between HRQoL scores and coded algorithm-retrieved comments. Analysis indicated that the mechanism by which service quality impacted on HRQoL was the extent to which services prevented or alleviated challenges associated with disease and treatment burdens. Learning-based text mining techniques were found useful and practical tools to identify specific free-text comments within a large dataset, facilitating resource-efficient qualitative analysis. This method should be considered for future PROM analysis to inform policy and practice. Study findings indicated that
DDMGD: the database of text-mined associations between genes methylated in diseases from different species.

Science.gov (United States)

Bin Raies, Arwa; Mansour, Hicham; Incitti, Roberto; Bajic, Vladimir B

2015-01-01

Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD's scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Regulatory view of the close-out of the uranium ore mine Zirovski Vrh

International Nuclear Information System (INIS)

Vrankar, L.

2005-01-01

The production of the uranium mine Zirovski Vrh ceased in 1990. The main remaining problem of the remadiation are mine and mill tailings. The uranium mine Zirovski Vrh has one mill tailings site Borst and one waste pile Jazbec. According to the Act on protection against ionising radiation and nuclear safety which was adopted by the Parliament in 2002, they are classified as radiation facilities. Slovenian Nuclear Safety Administration (SNSA) is authorised for issuing a mandatory consent to mining work. The SNSA prepared the initial proposal of content of the safety report for the mine waste pile Jazbec. In 2005, according to the detailed content of this document, the public company Zirovski Vrh Ltd prepared the safety report which was examined by an authorised expert for radiation and nuclear safety. After a careful revision of the safety evaluation report, the consensus for mining work shall be issued by the SNSA. After finishing the mining works the SNSA shall also issue a licence for the closure of waste pile Jazbec. The main goal of this article is to present the Slovenian regulations which cover also mining work in the field of close-out of the uranium ore mine. (author)
Annotated chemical patent corpus: a gold standard for text mining.

Directory of Open Access Journals (Sweden)

Saber A Akhondi

Full Text Available Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org.
Construction accident narrative classification: An evaluation of text mining techniques.

Science.gov (United States)

Goh, Yang Miang; Ubeynarayana, C U

2017-11-01

Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. Copyright © 2017 Elsevier Ltd. All rights reserved.
Environmental regulation of exploration and mining operations in Asian countries

International Nuclear Information System (INIS)

Otto, J.; Naito, Koh; Pring, G.

1999-01-01

This paper offers a new perspective on the environmental laws in Asian nations affecting the exploration, mining, and reclamation activities of the mineral resource industry: the perspective of the senior government officials in those countries, whose job is to enforce these new environmental laws. The article presents the results of a 1998 survey of national environmental officials in Asia conducted by the Colorado School of Mines and the Metal Mining Agency of Japan. Officials in 10 diverse countries - Cambodia, China, Indonesia, Lao PDR, Malaysia, Myanmar, Mongolia, Philippines, Thailand and Vietnam - responded to a detailed questionnaire covering applicable laws, agencies, protected areas, covered mineral activities, financial assurance, environmental impact assessment, public involvement, environmental standards, permit and reclamation requirements. The survey confirms that Asian nations are part of the global trend towards national government regulatory structures that balance mineral development objectives with environmental considerations. The survey also shows developing regulatory systems (some embryonic, some more mature) utilizing a combination of mining and environmental acts, and often an 'insider' perspective of the national officials administering the laws. While that perspective is not without its biases (not least the rigor of enforcement), it may nevertheless be of use in company planning. The emerging regulatory picture contradicts the conventional notion that it is the 'lower' level of regulation in Asia that is attracting foreign direct investment in mining. (author)
Narbalek uranium mine: from EIS to decommissioning

International Nuclear Information System (INIS)

Waggitt, P.W.

2000-01-01

The Nabarlek uranium mine operated in Northern Australia from 1979 until 1989 and was the first of the 'new generation' of uranium mines to go through the cycle of EIS, operation and decommissioning. The paper describes the environmental and operational approval processes, the regulatory regime and the decommissioning procedures at the mine. The mine was located on land owned by indigenous Aboriginal people and so there were serious cultural considerations to be taken into account throughout the mine's life. Site work for decommissioning and rehabilitation was completed in 1995 but revegetation assessment has continued until the present time (1999). The paper concludes with the latest assessment and monitoring data and discusses the lessons learned by all parties from the completion of the cycle of mine life 'from cradle to grave'. (author)
Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

NARCIS (Netherlands)

Hettne, K.M.; Boorsma, A.; Dartel, D.A. van; Goeman, J.J.; Jong, E. de; Piersma, A.H.; Stierum, R.H.; Kleinjans, J.C.; Kors, J.A.

2013-01-01

BACKGROUND: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set
Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data

NARCIS (Netherlands)

Hettne, K.M.; Boorsma, A.; Dartel, van D.A.M.; Goeman, J.J.; Jong, de E.; Piersma, A.H.; Stierum, R.H.; Kleinjans, J.C.; Kors, J.A.

2013-01-01

Background: Availability of chemical response-specific lists of genes (gene sets) for pharmacological and/or toxic effect prediction for compounds is limited. We hypothesize that more gene sets can be created by next-generation text mining (next-gen TM), and that these can be used with gene set
Harnessing the Power of Text Mining for the Detection of Abusive Content in Social Media

OpenAIRE

Chen, Hao; McKeever, Susan; Delany, Sarah Jane

2016-01-01

Abstract The issues of cyberbullying and online harassment have gained considerable coverage in the last number of years. Social media providers need to be able to detect abusive content both accurately and efficiently in order to protect their users. Our aim is to investigate the application of core text mining techniques for the automatic detection of abusive content across a range of social media sources include blogs, forums, media-sharing, Q&A and chat - using datasets from Twitter, YouT...
Mining

Directory of Open Access Journals (Sweden)

Khairullah Khan

2014-09-01

Full Text Available Opinion mining is an interesting area of research because of its applications in various fields. Collecting opinions of people about products and about social and political events and problems through the Web is becoming increasingly popular every day. The opinions of users are helpful for the public and for stakeholders when making certain decisions. Opinion mining is a way to retrieve information through search engines, Web blogs and social networks. Because of the huge number of reviews in the form of unstructured text, it is impossible to summarize the information manually. Accordingly, efficient computational methods are needed for mining and summarizing the reviews from corpuses and Web documents. This study presents a systematic literature survey regarding the computational techniques, models and algorithms for mining opinion components from unstructured reviews.
What Online Communities Can Tell Us About Electronic Cigarettes and Hookah Use: A Study Using Text Mining and Visualization Techniques.

Science.gov (United States)

Chen, Annie T; Zhu, Shu-Hong; Conway, Mike

2015-09-29

The rise in popularity of electronic cigarettes (e-cigarettes) and hookah over recent years has been accompanied by some confusion and uncertainty regarding the development of an appropriate regulatory response towards these emerging products. Mining online discussion content can lead to insights into people's experiences, which can in turn further our knowledge of how to address potential health implications. In this work, we take a novel approach to understanding the use and appeal of these emerging products by applying text mining techniques to compare consumer experiences across discussion forums. This study examined content from the websites Vapor Talk, Hookah Forum, and Reddit to understand people's experiences with different tobacco products. Our investigation involves three parts. First, we identified contextual factors that inform our understanding of tobacco use behaviors, such as setting, time, social relationships, and sensory experience, and compared the forums to identify the ones where content on these factors is most common. Second, we compared how the tobacco use experience differs with combustible cigarettes and e-cigarettes. Third, we investigated differences between e-cigarette and hookah use. In the first part of our study, we employed a lexicon-based extraction approach to estimate prevalence of contextual factors, and then we generated a heat map based on these estimates to compare the forums. In the second and third parts of the study, we employed a text mining technique called topic modeling to identify important topics and then developed a visualization, Topic Bars, to compare topic coverage across forums. In the first part of the study, we identified two forums, Vapor Talk Health & Safety and the Stopsmoking subreddit, where discussion concerning contextual factors was particularly common. The second part showed that the discussion in Vapor Talk Health & Safety focused on symptoms and comparisons of combustible cigarettes and e
LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes.

Science.gov (United States)

Cañada, Andres; Capella-Gutierrez, Salvador; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso; Krallinger, Martin

2017-07-03

A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes-CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
The North American Bats and Mines Project: a cooperative approach for integrating wildlife, ecosystem management, and mine land reclamation

Energy Technology Data Exchange (ETDEWEB)

Taylor, D.A.R. [Bat Conservation International, Austin, TX (United States)

1995-06-01

Abandoned underground mines in North America provide a habitat for bats. Closure of mines without conducting biological surveys can endanger bat species that are abundant. The North American Bats and Mines Project (NABMP) has been created by the U.S. Bureau of Land Management and Bat Conservation International to provide coordination among government and conservation organizations and the mining industry in order to minimize the loss of bats living in mines. NABMP provides coordination through education on the importance of mines for bat populations by providing training on mine assessment and closure methods, by assisting with protection and improvement of abandoned mine roosts, and by developing methods for creating new bat habitat. 1 tab.
Seqenv: linking sequences to environments through text mining.

Science.gov (United States)

Sinclair, Lucas; Ijaz, Umer Z; Jensen, Lars Juhl; Coolen, Marco J L; Gubry-Rangin, Cecile; Chroňáková, Alica; Oulas, Anastasis; Pavloudi, Christina; Schnetzer, Julia; Weimann, Aaron; Ijaz, Ali; Eiler, Alexander; Quince, Christopher; Pafilis, Evangelos

2016-01-01

Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the "nt" nucleotide database provided by NCBI and, out of every hit, extracts-if it is available-the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography. To install seqenv, go to: https://github.com/xapple/seqenv.
Studies on medicinal herbs for cognitive enhancement based on the text mining of Dongeuibogam and preliminary evaluation of its effects.

Science.gov (United States)

Pak, Malk Eun; Kim, Yu Ri; Kim, Ha Neui; Ahn, Sung Min; Shin, Hwa Kyoung; Baek, Jin Ung; Choi, Byung Tae

2016-02-17

In literature on Korean medicine, Dongeuibogam (Treasured Mirror of Eastern Medicine), published in 1613, represents the overall results of the traditional medicines of North-East Asia based on prior medicinal literature of this region. We utilized this medicinal literature by text mining to establish a list of candidate herbs for cognitive enhancement in the elderly and then performed an evaluation of their effects. Text mining was performed for selection of candidate herbs. Cell viability was determined in HT22 hippocampal cells and immunohistochemistry and behavioral analysis was performed in a kainic acid (KA) mice model in order to observe alterations of hippocampal cells and cognition. Twenty four herbs for cognitive enhancement in the elderly were selected by text mining of Dongeuibogam. In HT22 cells, pretreatment with 3 candidate herbs resulted in significantly reduced glutamate-induced cell death. Panax ginseng was the most neuroprotective herb against glutamate-induced cell death. In the hippocampus of a KA mice model, pretreatment with 11 candidate herbs resulted in suppression of caspase-3 expression. Treatment with 7 candidate herbs resulted in significantly enhanced expression levels of phosphorylated cAMP response element binding protein. Number of proliferated cells indicated by BrdU labeling was increased by treatment with 10 candidate herbs. Schisandra chinensis was the most effective herb against cell death and proliferation of progenitor cells and Rehmannia glutinosa in neuroprotection in the hippocampus of a KA mice model. In a KA mice model, we confirmed improved spatial and short memory by treatment with the 3 most effective candidate herbs and these recovered functions were involved in a higher number of newly formed neurons from progenitor cells in the hippocampus. These established herbs and their combinations identified by text-mining technique and evaluation for effectiveness may have value in further experimental and clinical
A preliminary approach to creating an overview of lactoferrin multi-functionality utilizing a text mining method.

Science.gov (United States)

Shimazaki, Kei-ichi; Kushida, Tatsuya

2010-06-01

Lactoferrin is a multi-functional metal-binding glycoprotein that exhibits many biological functions of interest to many researchers from the fields of clinical medicine, dentistry, pharmacology, veterinary medicine, nutrition and milk science. To date, a number of academic reports concerning the biological activities of lactoferrin have been published and are easily accessible through public data repositories. However, as the literature is expanding daily, this presents challenges in understanding the larger picture of lactoferrin function and mechanisms. In order to overcome the "analysis paralysis" associated with lactoferrin information, we attempted to apply a text mining method to the accumulated lactoferrin literature. To this end, we used the information extraction system GENPAC (provided by Nalapro Technologies Inc., Tokyo). This information extraction system uses natural language processing and text mining technology. This system analyzes the sentences and titles from abstracts stored in the PubMed database, and can automatically extract binary relations that consist of interactions between genes/proteins, chemicals and diseases/functions. We expect that such information visualization analysis will be useful in determining novel relationships among a multitude of lactoferrin functions and mechanisms. We have demonstrated the utilization of this method to find pathways of lactoferrin participation in neovascularization, Helicobacter pylori attack on gastric mucosa, atopic dermatitis and lipid metabolism.
Carbon Sequestration on Surface Mine Lands

Energy Technology Data Exchange (ETDEWEB)

Donald Graves; Christopher Barton; Richard Sweigard; Richard Warner; Carmen Agouridis

2006-03-31

Since the implementation of the federal Surface Mining Control and Reclamation Act of 1977 (SMCRA) in May of 1978, many opportunities have been lost for the reforestation of surface mines in the eastern United States. Research has shown that excessive compaction of spoil material in the backfilling and grading process is the biggest impediment to the establishment of productive forests as a post-mining land use (Ashby, 1998, Burger et al., 1994, Graves et al., 2000). Stability of mine sites was a prominent concern among regulators and mine operators in the years immediately following the implementation of SMCRA. These concerns resulted in the highly compacted, flatly graded, and consequently unproductive spoils of the early post-SMCRA era. However, there is nothing in the regulations that requires mine sites to be overly compacted as long as stability is achieved. It has been cultural barriers and not regulatory barriers that have contributed to the failure of reforestation efforts under the federal law over the past 27 years. Efforts to change the perception that the federal law and regulations impede effective reforestation techniques and interfere with bond release must be implemented. Demonstration of techniques that lead to the successful reforestation of surface mines is one such method that can be used to change perceptions and protect the forest ecosystems that were indigenous to these areas prior to mining. The University of Kentucky initiated a large-scale reforestation effort to address regulatory and cultural impediments to forest reclamation in 2003. During the three years of this project 383,000 trees were planted on over 556 acres in different physiographic areas of Kentucky (Table 1, Figure 1). Species used for the project were similar to those that existed on the sites before mining was initiated (Table 2). A monitoring program was undertaken to evaluate growth and survival of the planted species as a function of spoil characteristics and
E-Cigarette Social Media Messages: A Text Mining Analysis of Marketing and Consumer Conversations on Twitter

Science.gov (United States)

2016-01-01

Background As the use of electronic cigarettes (e-cigarettes) rises, social media likely influences public awareness and perception of this emerging tobacco product. Objective This study examined the public conversation on Twitter to determine overarching themes and insights for trending topics from commercial and consumer users. Methods Text mining uncovered key patterns and important topics for e-cigarettes on Twitter. SAS Text Miner 12.1 software (SAS Institute Inc) was used for descriptive text mining to reveal the primary topics from tweets collected from March 24, 2015, to July 3, 2015, using a Python script in conjunction with Twitter’s streaming application programming interface. A total of 18 keywords related to e-cigarettes were used and resulted in a total of 872,544 tweets that were sorted into overarching themes through a text topic node for tweets (126,127) and retweets (114,451) that represented more than 1% of the conversation. Results While some of the final themes were marketing-focused, many topics represented diverse proponent and user conversations that included discussion of policies, personal experiences, and the differentiation of e-cigarettes from traditional tobacco, often by pointing to the lack of evidence for the harm or risks of e-cigarettes or taking the position that e-cigarettes should be promoted as smoking cessation devices. Conclusions These findings reveal that unique, large-scale public conversations are occurring on Twitter alongside e-cigarette advertising and promotion. Proponents and users are turning to social media to share knowledge, experience, and questions about e-cigarette use. Future research should focus on these unique conversations to understand how they influence attitudes towards and use of e-cigarettes. PMID:27956376

The ATEE action during the drawing up of regulatory texts; L`action de l`ATEE lors de l`elaboration des textes reglementaires

Energy Technology Data Exchange (ETDEWEB)

Jacubowiez, I. [Groupe de Travail Environnement de ATEE, Association Technique Energie Environnement, 94 - Arcueil (France)

1997-12-31

The role of the French Energy Environment Technical Association (ATEE) is to promote the rational and efficient use of energies. In this context, the ATEE participated to the elaboration of regulatory text and in particular to the revision of the nomenclature of classified installations and to the decrees proceeding from the clean air acts and from the rational use of energy. This paper describes the action of the ATEE carried out in both topics. (J.S.)
Roles for text mining in protein function prediction.

Science.gov (United States)

Verspoor, Karin M

2014-01-01

The Human Genome Project has provided science with a hugely valuable resource: the blueprints for life; the specification of all of the genes that make up a human. While the genes have all been identified and deciphered, it is proteins that are the workhorses of the human body: they are essential to virtually all cell functions and are the primary mechanism through which biological function is carried out. Hence in order to fully understand what happens at a molecular level in biological organisms, and eventually to enable development of treatments for diseases where some aspect of a biological system goes awry, we must understand the functions of proteins. However, experimental characterization of protein function cannot scale to the vast amount of DNA sequence data now available. Computational protein function prediction has therefore emerged as a problem at the forefront of modern biology (Radivojac et al., Nat Methods 10(13):221-227, 2013).Within the varied approaches to computational protein function prediction that have been explored, there are several that make use of biomedical literature mining. These methods take advantage of information in the published literature to associate specific proteins with specific protein functions. In this chapter, we introduce two main strategies for doing this: association of function terms, represented as Gene Ontology terms (Ashburner et al., Nat Genet 25(1):25-29, 2000), to proteins based on information in published articles, and a paradigm called LEAP-FS (Literature-Enhanced Automated Prediction of Functional Sites) in which literature mining is used to validate the predictions of an orthogonal computational protein function prediction method.
French uranium mining sites remediation

International Nuclear Information System (INIS)

Roche, M.

2002-01-01

Following a presentation of the COGEMA's general policy for the remediation of uranium mining sites and the regulatory requirements, the current phases of site remediation operations are described. Specific operations for underground mines, open pits, milling facilities and confining the milled residues to meet long term public health concerns are detailed and discussed in relation to the communication strategies to show and explain the actions of COGEMA. A brief review of the current remediation situation at the various French facilities is finally presented. (author)
75 FR 71668 - Cibota National Forest, Mount Taylor Ranger District, NM, Roca Honda Mine

Science.gov (United States)

2010-11-24

... develop and conduct underground uranium mining operations on their mining claims on and near Jesus Mesa in... open to mineral entry under the General Mining Law of 1872. Section 16 is State of New Mexico land, which is not subject to the regulatory jurisdiction of the Forest Service. Roca Honda proposes a mine...
Managing Environmental and Health Impacts of Uranium Mining

International Nuclear Information System (INIS)

Vance, Robert; ); Hinton, Nicole; Huffman, Dale; Harris, Frank; Arnold, Nikolas; Ruokonen, Eeva; Jakubick, Alexander; Tyulyubayev, Zekail; Till, William von; Woods, Peter; ); Hall, Susan; Da Silva, Felipe; Vostarek, Pavel

2014-01-01

Uranium is the raw material used to produce fuel for nuclear power plants that generate significant amounts of electricity with life cycle carbon emissions that are as low as renewable energy sources. However, the mining of this valuable energy commodity remains controversial, principally because of environmental and health impacts associated with the early years of uranium mining. Maximising production in the face of rapidly rising demand was the principal goal of uranium mining at the time, with little concern given to properly managing environmental and health impacts. Today, societal expectations and regulation of the industry are directed much more towards radiation protection, environmental stewardship, health and safety. With over 430 operational reactors in the world, nuclear fuel will be required for many decades in order to meet requirements to fuel the existing fleet and demand created by new reactors, given the projected growth in nuclear generating capacity, particularly in the developing world. New mines will in turn be needed. As a result, enhancing awareness of leading practices in uranium mining is increasingly important. This report aims to dispel some of the myths, fears and misconceptions about uranium mining by providing an overview of how leading practice mining can significantly reduce all impacts compared to the early strategic period. It also provides a non-technical overview of leading practices, the regulatory environment in which mining companies operate and the outcomes of implementing such practices. Societal expectations related to environmental protection and the safety of workers and the public evolved considerably as the outcomes of the early era of mining became apparent, driving changes in regulatory oversight and mining practices. Uranium mining is now conducted under significantly different circumstances, with leading practice mining the most regulated and one of the safest and environmentally responsible forms of mining in the
Best practice in situ recovery uranium mining in Australia

International Nuclear Information System (INIS)

Lambert, I.B.; McKay, A.D.; Carson, L.J.

2010-01-01

The Australian Government policy is to ensure that uranium mining, milling and rehabilitation is based on world best practice standards. A best practice guide for in situ recovery (ISR) uranium mining has been developed to communicate the Australian Government's expectations with a view to achieving greater certainty that ISR mining projects meet Australian Government policy and consistency in the assessment of ISR mine proposals within multiple government regulatory processes. The guide focuses on the main perceived risks; impacts on groundwaters, disposal of mining residues, and radiation protection. World best practice does not amount to a universal template for ISR mining because the characteristics of individual ore bodies determine the best practice. (author)
An unsupervised text mining method for relation extraction from biomedical literature.

Directory of Open Access Journals (Sweden)

Changqin Quan

Full Text Available The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1 Protein-protein interactions extraction, and (2 Gene-suicide association extraction. The evaluation of task (1 on the benchmark dataset (AImed corpus showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene-suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.
Environmental impact of uranium mining and milling

International Nuclear Information System (INIS)

Dory, A.B.

1981-08-01

The author introduces the subject with an overview of the regulatory requirments and philosophy applied to uranium mines and mills. The special attention given to tailings management is highlighted, and a discussion of the basic environmental concerns is concluded with an itemizing of the main tasks facing the AECB. The extent of the environmental impact of uranium mining, milling and waste management is illustrated with specific details pertaining to mines in the Elliot Lake area. The author concludes that the impact on the ground and surface water system is not alarming, and the impact on air quality is not significant beyond a few hundred metres from the mining facilities. The publicly perceived impact is discussed, followed by a rationale for the continued licensing of new uranium mining operations complete with tailings management facilities
Control of radon daughters in underground mining

International Nuclear Information System (INIS)

Swent, L.W.

1983-01-01

This paper discusses technical developments that may enable uranium mine operators to improve engineering controls of radon daughter concentrations in mines, and developments in regulatory controls. The origin of radon daughters in underground mines is explained. The procedure for sampling and determining the concentration of alpha radiation in sampled air is reviewed. The principal technical development in the last few years has been the perfection and use of a class of meters which determine radon daughter concentrations in an air sample in a matter of two or three minutes without any aging period. A number of underground uranium mine operators are now using ''instant'' type meters and the Mine Safety and Health Administration (MSHA) has approved their use in a number of mines. The difficulty experienced by uranium mine operators in complying with a MSHA regulation which requires that no person be exposed to radon daughter concentrations exceeding 1 Working Level (WL) in any active working place is discussed
30 CFR 906.15 - Approval of Colorado regulatory program amendments.

Science.gov (United States)

2010-07-01

... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Approval of Colorado regulatory program amendments. 906.15 Section 906.15 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR PROGRAMS FOR THE CONDUCT OF SURFACE MINING OPERATIONS WITHIN EACH STATE COLORADO...
Sustainable Mining Environment: Technical Review of Post-mining Plans

Directory of Open Access Journals (Sweden)

Restu Juniah

2017-12-01

Full Text Available The mining industry exists because humans need mining commodities to meet their daily needs such as motor vehicles, mobile phones, electronic equipment and others. Mining commodities as mentioned in Government Regulation No. 23 of 2010 on Implementation of Mineral and Coal Mining Business Activities are radioactive minerals, metal minerals, nonmetallic minerals, rocks and coal. Mineral and coal mining is conducted to obtain the mining commodities through production operations. Mining and coal mining companies have an obligation to ensure that the mining environment in particular after the post production operation or post mining continues. The survey research aims to examine technically the post-mining plan in coal mining of PT Samantaka Batubara in Indragiri Hulu Regency of Riau Province towards the sustainability of the mining environment. The results indicate that the post-mining plan of PT Samantaka Batubara has met the technical aspects required in post mining planning for a sustainable mining environment. Postponement of post-mining land of PT Samantaka Batubara for garden and forest zone. The results of this study are expected to be useful and can be used by stakeholders, academics, researchers, practitioners and associations of mining, and the environment.
LocText

DEFF Research Database (Denmark)

Cejuela, Juan Miguel; Vinchurkar, Shrikant; Goldberg, Tatyana

2018-01-01

trees and was trained and evaluated on a newly improved LocTextCorpus. Combined with an automatic named-entity recognizer, LocText achieved high precision (P = 86%±4). After completing development, we mined the latest research publications for three organisms: human (Homo sapiens), budding yeast...
Impact of Text-Mining and Imitating Strategies on Lexical Richness, Lexical Diversity and General Success in Second Language Writing

Science.gov (United States)

Çepni, Sevcan Bayraktar; Demirel, Elif Tokdemir

2016-01-01

This study aimed to find out the impact of "text mining and imitating" strategies on lexical richness, lexical diversity and general success of students in their compositions in second language writing. The participants were 98 students studying their first year in Karadeniz Technical University in English Language and Literature…
Development of Workshops on Biodiversity and Evaluation of the Educational Effect by Text Mining Analysis

Science.gov (United States)

Baba, R.; Iijima, A.

2014-12-01

Conservation of biodiversity is one of the key issues in the environmental studies. As means to solve this issue, education is becoming increasingly important. In the previous work, we have developed a course of workshops on the conservation of biodiversity. To disseminate the course as a tool for environmental education, determination of the educational effect is essential. A text mining enables analyses of frequency and co-occurrence of words in the freely described texts. This study is intended to evaluate the effect of workshop by using text mining technique. We hosted the originally developed workshop on the conservation of biodiversity for 22 college students. The aim of the workshop was to inform the definition of biodiversity. Generally, biodiversity refers to the diversity of ecosystem, diversity between species, and diversity within species. To facilitate discussion, supplementary materials were used. For instance, field guides of wildlife species were used to discuss about the diversity of ecosystem. Moreover, a hierarchical framework in an ecological pyramid was shown for understanding the role of diversity between species. Besides, we offered a document material on the historical affair of Potato Famine in Ireland to discuss about the diversity within species from the genetic viewpoint. Before and after the workshop, we asked students for free description on the definition of biodiversity, and analyzed by using Tiny Text Miner. This technique enables Japanese language morphological analysis. Frequently-used words were sorted into some categories. Moreover, a principle component analysis was carried out. After the workshop, frequency of the words tagged to diversity between species and diversity within species has significantly increased. From a principle component analysis, the 1st component consists of the words such as producer, consumer, decomposer, and food chain. This indicates that the students have comprehended the close relationship between
E-Cigarette Social Media Messages: A Text Mining Analysis of Marketing and Consumer Conversations on Twitter.

Science.gov (United States)

Lazard, Allison J; Saffer, Adam J; Wilcox, Gary B; Chung, Arnold DongWoo; Mackert, Michael S; Bernhardt, Jay M

2016-12-12

As the use of electronic cigarettes (e-cigarettes) rises, social media likely influences public awareness and perception of this emerging tobacco product. This study examined the public conversation on Twitter to determine overarching themes and insights for trending topics from commercial and consumer users. Text mining uncovered key patterns and important topics for e-cigarettes on Twitter. SAS Text Miner 12.1 software (SAS Institute Inc) was used for descriptive text mining to reveal the primary topics from tweets collected from March 24, 2015, to July 3, 2015, using a Python script in conjunction with Twitter's streaming application programming interface. A total of 18 keywords related to e-cigarettes were used and resulted in a total of 872,544 tweets that were sorted into overarching themes through a text topic node for tweets (126,127) and retweets (114,451) that represented more than 1% of the conversation. While some of the final themes were marketing-focused, many topics represented diverse proponent and user conversations that included discussion of policies, personal experiences, and the differentiation of e-cigarettes from traditional tobacco, often by pointing to the lack of evidence for the harm or risks of e-cigarettes or taking the position that e-cigarettes should be promoted as smoking cessation devices. These findings reveal that unique, large-scale public conversations are occurring on Twitter alongside e-cigarette advertising and promotion. Proponents and users are turning to social media to share knowledge, experience, and questions about e-cigarette use. Future research should focus on these unique conversations to understand how they influence attitudes towards and use of e-cigarettes. ©Allison J Lazard, Adam J Saffer, Gary B Wilcox, Arnold DongWoo Chung, Michael S Mackert, Jay M Bernhardt. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 12.12.2016.
Combining QSAR Modeling and Text-Mining Techniques to Link Chemical Structures and Carcinogenic Modes of Action.

Science.gov (United States)

Papamokos, George; Silins, Ilona

2016-01-01

There is an increasing need for new reliable non-animal based methods to predict and test toxicity of chemicals. Quantitative structure-activity relationship (QSAR), a computer-based method linking chemical structures with biological activities, is used in predictive toxicology. In this study, we tested the approach to combine QSAR data with literature profiles of carcinogenic modes of action automatically generated by a text-mining tool. The aim was to generate data patterns to identify associations between chemical structures and biological mechanisms related to carcinogenesis. Using these two methods, individually and combined, we evaluated 96 rat carcinogens of the hematopoietic system, liver, lung, and skin. We found that skin and lung rat carcinogens were mainly mutagenic, while the group of carcinogens affecting the hematopoietic system and the liver also included a large proportion of non-mutagens. The automatic literature analysis showed that mutagenicity was a frequently reported endpoint in the literature of these carcinogens, however, less common endpoints such as immunosuppression and hormonal receptor-mediated effects were also found in connection with some of the carcinogens, results of potential importance for certain target organs. The combined approach, using QSAR and text-mining techniques, could be useful for identifying more detailed information on biological mechanisms and the relation with chemical structures. The method can be particularly useful in increasing the understanding of structure and activity relationships for non-mutagens.
Life priorities in the HIV-positive Asians: a text-mining analysis in young vs. old generation.

Science.gov (United States)

Chen, Wei-Ti; Barbour, Russell

2017-04-01

HIV/AIDS is one of the most urgent and challenging public health issues, especially since it is now considered a chronic disease. In this project, we used text mining techniques to extract meaningful words and word patterns from 45 transcribed in-depth interviews of people living with HIV/AIDS (PLWHA) conducted in Taipei, Beijing, Shanghai, and San Francisco from 2006 to 2013. Text mining analysis can predict whether an emerging field will become a long-lasting source of academic interest or whether it is simply a passing source of interest that will soon disappear. The data were analyzed by age group (45 and older vs. 44 and younger). The highest ranking fragments in the order of frequency were: "care", "daughter", "disease", "family", "HIV", "hospital", "husband", "medicines", "money", "people", "son", "tell/disclosure", "thought", "want", and "years". Participants in the 44-year-old and younger group were focused mainly on disease disclosure, their families, and their financial condition. In older PLWHA, social supports were one of the main concerns. In this study, we learned that different age groups perceive the disease differently. Therefore, when designing intervention, researchers should consider to tailor an intervention to a specific population and to help PLWHA achieve a better quality of life. Promoting self-management can be an effective strategy for every encounter with HIV-positive individuals.
Combining QSAR Modeling and Text-Mining Techniques to Link Chemical Structures and Carcinogenic Modes of Action

Science.gov (United States)

Papamokos, George; Silins, Ilona

2016-01-01

There is an increasing need for new reliable non-animal based methods to predict and test toxicity of chemicals. Quantitative structure-activity relationship (QSAR), a computer-based method linking chemical structures with biological activities, is used in predictive toxicology. In this study, we tested the approach to combine QSAR data with literature profiles of carcinogenic modes of action automatically generated by a text-mining tool. The aim was to generate data patterns to identify associations between chemical structures and biological mechanisms related to carcinogenesis. Using these two methods, individually and combined, we evaluated 96 rat carcinogens of the hematopoietic system, liver, lung, and skin. We found that skin and lung rat carcinogens were mainly mutagenic, while the group of carcinogens affecting the hematopoietic system and the liver also included a large proportion of non-mutagens. The automatic literature analysis showed that mutagenicity was a frequently reported endpoint in the literature of these carcinogens, however, less common endpoints such as immunosuppression and hormonal receptor-mediated effects were also found in connection with some of the carcinogens, results of potential importance for certain target organs. The combined approach, using QSAR and text-mining techniques, could be useful for identifying more detailed information on biological mechanisms and the relation with chemical structures. The method can be particularly useful in increasing the understanding of structure and activity relationships for non-mutagens. PMID:27625608
A Standard Characterization Methodology for Respirable Coal Mine Dust Using SEM-EDX

Directory of Open Access Journals (Sweden)

Rachel Sellaro

2015-12-01

Full Text Available A key consideration for responsible development of mineral and energy resources is the well-being of workers. Respirable dust in mining environments represents a serious concern for occupational health. In particular, coal miners can be exposed to a variety of dust characteristics depending on their work activities, and some exposures may pose risk for lung diseases like CWP and silicosis. As underscored by common regulatory frameworks, respirable dust exposures are generally characterized on the basis of total mass concentration, and also the silica mass fraction. However, relatively little emphasis has been placed on other dust characteristics that may be important in terms of identifying health risks. Comprehensive particle-level analysis to estimate chemistry, size, and shape distributions of particles is possible. This paper describes a standard methodology for characterization of respirable coal mine dust using scanning electron microscopy (SEM with energy dispersive X-ray (EDX. Preliminary verification of the method is shown based several dust samples collected from an underground mine in Central Appalachia.
Chemical Topic Modeling: Exploring Molecular Data Sets Using a Common Text-Mining Approach.

Science.gov (United States)

Schneider, Nadine; Fechner, Nikolas; Landrum, Gregory A; Stiefl, Nikolaus

2017-08-28

Big data is one of the key transformative factors which increasingly influences all aspects of modern life. Although this transformation brings vast opportunities it also generates novel challenges, not the least of which is organizing and searching this data deluge. The field of medicinal chemistry is not different: more and more data are being generated, for instance, by technologies such as DNA encoded libraries, peptide libraries, text mining of large literature corpora, and new in silico enumeration methods. Handling those huge sets of molecules effectively is quite challenging and requires compromises that often come at the expense of the interpretability of the results. In order to find an intuitive and meaningful approach to organizing large molecular data sets, we adopted a probabilistic framework called "topic modeling" from the text-mining field. Here we present the first chemistry-related implementation of this method, which allows large molecule sets to be assigned to "chemical topics" and investigating the relationships between those. In this first study, we thoroughly evaluate this novel method in different experiments and discuss both its disadvantages and advantages. We show very promising results in reproducing human-assigned concepts using the approach to identify and retrieve chemical series from sets of molecules. We have also created an intuitive visualization of the chemical topics output by the algorithm. This is a huge benefit compared to other unsupervised machine-learning methods, like clustering, which are commonly used to group sets of molecules. Finally, we applied the new method to the 1.6 million molecules of the ChEMBL22 data set to test its robustness and efficiency. In about 1 h we built a 100-topic model of this large data set in which we could identify interesting topics like "proteins", "DNA", or "steroids". Along with this publication we provide our data sets and an open-source implementation of the new method (CheTo) which

Frac Sand Mines Are Preferentially Sited in Unzoned Rural Areas.

Directory of Open Access Journals (Sweden)

Christina Locke

Full Text Available Shifting markets can cause unexpected, stochastic changes in rural landscapes that may take local communities by surprise. Preferential siting of new industrial facilities in poor areas or in areas with few regulatory restrictions can have implications for environmental sustainability, human health, and social justice. This study focuses on frac sand mining-the mining of high-quality silica sand used in hydraulic fracturing processes for gas and oil extraction. Frac sand mining gained prominence in the 2000s in the upper midwestern United States where nonmetallic mining is regulated primarily by local zoning. I asked whether frac sand mines were more commonly sited in rural townships without formal zoning regulations or planning processes than in those that undertook zoning and planning before the frac sand boom. I also asked if mine prevalence was correlated with socioeconomic differences across townships. After creating a probability surface to map areas most suitable for frac sand mine occurrence, I developed neutral landscape models from which to compare actual mine distributions in zoned and unzoned areas at three different spatial extents. Mines were significantly clustered in unzoned jurisdictions at the statewide level and in 7 of the 8 counties with at least three frac sand mines and some unzoned land. Subsequent regression analyses showed mine prevalence to be uncorrelated with land value, tax rate, or per capita income, but correlated with remoteness and zoning. The predicted mine count in unzoned townships was over two times higher than that in zoned townships. However, the county with the most mines by far was under a county zoning ordinance, perhaps indicating industry preferences for locations with clear, homogenous rules over patchwork regulation. Rural communities can use the case of frac sand mining as motivation to discuss and plan for sudden land-use predicaments, rather than wait to grapple with unfamiliar legal processes
Regulation of uranium mining in the Northern Territory

International Nuclear Information System (INIS)

McGill, R.A.

2002-01-01

In Australia, uranium and other 'prescribed substances', including thorium, and any element having an atomic number greater than 92, are the property of the Commonwealth under the Atomic Energy Act 1953. However, the regulation of mining in Australia is managed by the States. The Uranium Mining Environment Control Act, was passed by the NT in 1978 and this remains the primary legislation through which uranium mining is regulated. Under working arrangements with the Commonwealth, the NT carries out regulatory activities including monitoring, evaluation and surveillance, in respect of each of the operating mines. The monitoring is overseen, validated and its continuing relevance audited by the Commonwealth Office of the Supervising Scientist and the Northern Land Council representing the local traditional owners. Environment Impact Assessment is co-ordinated jointly by the Commonwealth and the NT and has recently been concluded for the Jabiluka Project. Delays in final approval on this project are occasioned by social concerns expressed by some of the traditional indigenous owners and anti-nuclear protestors. Although Jabiluka is not in a World Heritage area, the concerns have resulted in intervention by the World Heritage Commission. This has required the Company and the Government to modify the way they handle the approval process. This paper analyses the development of the regulatory system which evolved to ensure best practice environmental, occupational health and safety management on the NT uranium mines. (author)
Injection of FGD Grout to Abate Acid Mine Drainage in Underground Coal Mines

Energy Technology Data Exchange (ETDEWEB)

Mafi, S.; Damian, M.T.; Senita, R.E.; Jewitt, W.C.; Bair, S.; Chin, Y.C.; Whitlatch, E.; Traina, S.; Wolfe, W.

1997-07-01

Acid Mine Drainage (AMD) from abandoned underground coal mines in Ohio is a concern for both residents and regulatory agencies. Effluent from these mines is typically characterized by low pH and high iron and sulfate concentrations and may contaminate local drinking-water supplies and streams. The objective of this project is to demonstrate the technical feasibility of injecting cementitious alkaline materials, such as Flue Gas Desulfurization (FGD) material to mitigate current adverse environmental impacts associated with AMD in a small, abandoned deep mine in Coshocton County Ohio. The Flue Gas Desulfurization material will be provided from American Electric Power`s (AEP) Conesville Plant. It will be injected as a grout mix that will use Fixated Flue Gas Desulfurization material and water. The subject site for this study is located on the border of Coshocton and Muskingum Counties, Ohio, approximately 1.5 miles south-southwest of the town of Wills Creek. The study will be performed at an underground mine designated as Mm-127 in the Ohio Department of Natural Resources register, also known as the Roberts-Dawson Mine. The mine operated in the mid-1950s, during which approximately 2 million cubic feet of coal was removed. Effluent discharging from the abandoned mine entrances has low pH in the range of 2.8-3.0 that drains directly into Wills Creek Lake. The mine covers approximately 14.6 acres. It is estimated that 26,000 tons of FGD material will be provided from AEP`s Conesville Power Plant located approximately 3 miles northwest of the subject site.
Injection of FGD Grout to Abate Acid Mine Drainage in Underground Coal Mines

International Nuclear Information System (INIS)

Mafi, S.; Damian, M.T.; Senita, R.E.; Jewitt, W.C.; Bair, S.; Chin, Y.C.; Whitlatch, E.; Traina, S.; Wolfe, W.

1997-07-01

Acid Mine Drainage (AMD) from abandoned underground coal mines in Ohio is a concern for both residents and regulatory agencies. Effluent from these mines is typically characterized by low pH and high iron and sulfate concentrations and may contaminate local drinking-water supplies and streams. The objective of this project is to demonstrate the technical feasibility of injecting cementitious alkaline materials, such as Flue Gas Desulfurization (FGD) material to mitigate current adverse environmental impacts associated with AMD in a small, abandoned deep mine in Coshocton County Ohio. The Flue Gas Desulfurization material will be provided from American Electric Power's (AEP) Conesville Plant. It will be injected as a grout mix that will use Fixated Flue Gas Desulfurization material and water. The subject site for this study is located on the border of Coshocton and Muskingum Counties, Ohio, approximately 1.5 miles south-southwest of the town of Wills Creek. The study will be performed at an underground mine designated as Mm-127 in the Ohio Department of Natural Resources register, also known as the Roberts-Dawson Mine. The mine operated in the mid-1950s, during which approximately 2 million cubic feet of coal was removed. Effluent discharging from the abandoned mine entrances has low pH in the range of 2.8-3.0 that drains directly into Wills Creek Lake. The mine covers approximately 14.6 acres. It is estimated that 26,000 tons of FGD material will be provided from AEP's Conesville Power Plant located approximately 3 miles northwest of the subject site
A clean environment approach to uranium mining

International Nuclear Information System (INIS)

Grancea, Luminita

2015-01-01

A global and multi-faceted response to climate change is essential if meaningful and cost-effective progress is to be made in reducing the effects of climate change around the world. There is no doubt that the uranium mining sector has an important role to play in such a goal. Uranium is the raw material used to produce fuel for long-lived nuclear facilities, necessary for the generation of significant amounts of baseload low-carbon electricity for decades to come. Given expectations of growth in nuclear generating capacity and the associated uranium demand, enhancing awareness of leading practices in uranium mining is indispensable. Actors in the uranium mining sector operate in a complex world, throughout different geographies, and involving global supply chains. They manage climate-sensitive water, land and energy resources and balance the interests of various stakeholders. Managed well, uranium mining delivers sustainable value for economic growth, employment and infrastructure, with specific attention given to the preservation of the environment. In the early phases of the industry, however, downside risks existed, which created legacy environmental and health issues that still can be recalled today. This article addresses key aspects of modern uranium mining operations that have been introduced as regulations and practices have evolved in response to societal attitudes about health, safety and environmental protection. Such aspects of mine management were seldom, if ever, respected in the early stages of uranium mining. With the implementation of modern mine lifecycle parameters and regulatory requirements, uranium mining has become a leader in safety and environmental management. Today, uranium mining is conducted under significantly different circumstances and is now the most regulated and one of the safest forms of mining in the world. Experiences from modern uranium mines show that successful companies develop innovative strategies to manage all the
The Regulation of Acid Mine Drainage in South Africa: Law and Governance Perspectives

Directory of Open Access Journals (Sweden)

Loretta Feris

2014-12-01

Full Text Available Acid mine drainage (AMD is arguably one of the most serious environmental concerns in South Africa. AMD is a legacy left behind by abandoned, derelict and defunct mines, and is a continuing by-product of existing mining activities. In addition to its environmental impacts, AMD will also impact on all the parameters of sustainability, including ecological, social and economic concerns. In particular, AMD is set to affect infrastructure, displace people and affect their livelihoods, influence economic activity, impact on the resource extraction industry, and affect South Africa's policies and actions in relation to climate change and its efforts to move towards a low carbon economy; and it will test the efficiency of regulatory interventions emanating from both the private and the public sector to the extreme. Given these pervasive challenges, in this article we provide a survey of the AMD problem in South Africa through the law and governance lens. We commence by highlighting the various issues and challenges that result from AMD in the environmental context on the one hand, and the law and governance context on the other hand. We then describe the many provisions of the regulatory framework that we believe would be instrumental in responding to the threat. We conclude the article with brief remarks on what we believe are important considerations in the future regulation of AMD.
Licensing of uranium mine and mill waste management systems

International Nuclear Information System (INIS)

Chamney, L.G.

1986-09-01

Systems for the management of wastes arising from uranium mining facilities are subject to regulatory control by the Atomic Energy Control Board (AECB). This paper describes the primary objectives, principles, requirements and guidelines which the AECB uses in the regulation of waste management activities at uranium mining facilities, and provides an understanding of the licensing process used by the AECB
Engineering Perspectives and Environmental Life Cycle Optimization to Enhance Aggregate Mining in Vietnam

Directory of Open Access Journals (Sweden)

Petra Schneider

2018-02-01

Full Text Available Cleaner Production (CP addresses precautionary, site-specific environmental measures to reduce emissions and assess resource efficiency potentials at the point of origin by analyzing operational material and energy flows. The approach is generally based on the criteria quality as well as environmental/occupational health and safety, and promotes their integration. The paper presents options for applying CP to aggregate mining, based on a Life Cycle Assessment (LCA and illustrated by results from a study of small-scale industrial aggregate mining in Hoa Binh Province (Vietnam. The regulatory framework to limit the impact of mining on the environment is largely comparable to international standards and is suitably enforced. Despite gaining experience through the practical handling of enforcement procedures over the long term, there is still a considerable potential to optimize CP strategies in Vietnam’s aggregate mining industry. This is shown by the results of a survey of aggregates mining companies in Hoa Binh Province as well as on-site data collection to determine the technological characteristics of production facilities alongside economic and environmental factors. The assessment of the survey is supported by LCA results for: (a the existing situation; and (b the scenario of a merging of companies, undertaken to improve the resource efficiency of the aggregate mining in Hoa Binh. Findings can help implement an integrated approach to foster the sustainable mining of building aggregates.
Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years.

Science.gov (United States)

Wagner, Mathias; Vicinus, Benjamin; Muthra, Sherieda T; Richards, Tereza A; Linder, Roland; Frick, Vilma Oliveira; Groh, Andreas; Rubie, Claudia; Weichert, Frank

2016-06-01

The continuous growth of medical sciences literature indicates the need for automated text analysis. Scientific writing which is neither unitary, transcending social situation nor defined by a timeless idea is subject to constant change as it develops in response to evolving knowledge, aims at different goals, and embodies different assumptions about nature and communication. The objective of this study was to evaluate whether publication dates should be considered when performing text mining. A search of PUBMED for combined references to chemokine identifiers and particular cancer related terms was conducted to detect changes over the past 36 years. Text analyses were performed using freeware available from the World Wide Web. TOEFL Scores of territories hosting institutional affiliations as well as various readability indices were investigated. Further assessment was conducted using Principal Component Analysis. Laboratory examination was performed to evaluate the quality of attempts to extract content from the examined linguistic features. The PUBMED search yielded a total of 14,420 abstracts (3,190,219 words). The range of findings in laboratory experimentation were coherent with the variability of the results described in the analyzed body of literature. Increased concurrence of chemokine identifiers together with cancer related terms was found at the abstract and sentence level, whereas complexity of sentences remained fairly stable. The findings of the present study indicate that concurrent references to chemokines and cancer increased over time whereas text complexity remained stable. Copyright © 2016 Elsevier Ltd. All rights reserved.
Proceedings of the twenty-fifth annual institute on mining health, safety and research

Energy Technology Data Exchange (ETDEWEB)

Tinney, G.R.; Bacho, A.; Karmis, M. [eds.

1994-12-31

The keynote session included papers on the US Bureau of Mines - a vision for the future; clean, green and lean; and the psychology of occupational safety. The technical sessions include panel discussions on Virginia`s revised mine safety regulations, and on independent contractors. Other papers covered: criminal enforcement of regulatory violations; accidents during surface mine mobile equipment; Federal Mine Safety and Health Review Commission; safety, technological and productivity potentials of highwall mining; and accidents caused by falls of unsupported roof.
5 CFR 5201.105 - Additional rules for Mine Safety and Health Administration employees.

Science.gov (United States)

2010-01-01

... for Mine Safety and Health Administration employees. The rules in this section apply to employees of... Mine Safety and Health Act. Example: A mine inspector who was a former employee of mining company X... Secretary of labor for Mine Safety and Health or the Assistant Secretary's designee may grant an employee a...
The Application of Machine Learning Algorithms for Text Mining based on Sentiment Analysis Approach

Directory of Open Access Journals (Sweden)

Reza Samizade

2018-06-01

Full Text Available Classification of the cyber texts and comments into two categories of positive and negative sentiment among social media users is of high importance in the research are related to text mining. In this research, we applied supervised classification methods to classify Persian texts based on sentiment in cyber space. The result of this research is in a form of a system that can decide whether a comment which is published in cyber space such as social networks is considered positive or negative. The comments that are published in Persian movie and movie review websites from 1392 to 1395 are considered as the data set for this research. A part of these data are considered as training and others are considered as testing data. Prior to implementing the algorithms, pre-processing activities such as tokenizing, removing stop words, and n-germs process were applied on the texts. Naïve Bayes, Neural Networks and support vector machine were used for text classification in this study. Out of sample tests showed that there is no evidence indicating that the accuracy of SVM approach is statistically higher than Naïve Bayes or that the accuracy of Naïve Bayes is not statistically higher than NN approach. However, the researchers can conclude that the accuracy of the classification using SVM approach is statistically higher than the accuracy of NN approach in 5% confidence level.
76 FR 35213 - AJT Mining Properties, Inc.; Notice of Preliminary Permit Application Accepted for Filing and...

Science.gov (United States)

2011-06-16

... DEPARTMENT OF ENERGY Federal Energy Regulatory Commission [Project No. 14115-000] AJT Mining..., Motions To Intervene, and Competing Applications On March 21, 2011, AJT Mining Properties, Inc., filed an...,000 megawatt-hours. Applicant Contact: Mr. Scott Willis, AJT Mining Properties, Inc., 5601 Tonsgard...
State policies and requirements for management of uranium mining and milling in New Mexico. Volume V. State policy needs for community impact assistance

International Nuclear Information System (INIS)

Vandevender, S.G.

1980-04-01

The report contained in this volume describes a program for management of the community impacts resulting from the growth of uranium mining and milling in New Mexico. The report, submitted to Sandia Laboratories by the New Mexico Department of Energy and Minerals, is reproduced without modification. The state recommends that federal funding and assistance be provided to implement a growth management program comprised of these seven components: (1) an early warning system, (2) a community planning and technical assistance capability, (3) flexible financing, (4) a growth monitoring system, (5) manpower training, (6) economic diversification planning, and (7) new technology testing
Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.

Science.gov (United States)

Alnazzawi, Noha; Thompson, Paul; Batista-Navarro, Riza; Ananiadou, Sophia

2015-01-01

Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single
77 FR 74845 - AJT Mining Properties, Inc.; Notice of Surrender of Preliminary Permit

Science.gov (United States)

2012-12-18

... DEPARTMENT OF ENERGY Federal Energy Regulatory Commission [Project No. 14115-001] AJT Mining Properties, Inc.; Notice of Surrender of Preliminary Permit Take notice that AJT Mining Properties, Inc., permittee for the proposed Yeldagalga Creek Hydroelectric Project, has requested that its preliminary permit...
Environmental assessment related to the operation of Hansen uranium mill project, WM-24, Cyprus Mines Corporation

International Nuclear Information System (INIS)

1981-01-01

An environmental assessment was prepared by the staff of the U.S. Nuclear Regulatory Commission, Office of Nuclear Material Safety and Safeguards, in response to a request for technical assistance from the State of Colorado in connection with licensing action on the proposed Cyprus Mines Corporation, Hansen uranium project. The major components of discussion are (1) a summary and recommended licensing conditions, (2) a description of the site environment and the proposed facility operation as well as alternatives in comparison with NRC's performance objectives for tailings management, and (3) a radiological assessment for estimating the facility's compliance with 10 CFR 20 and 40 CFR 190 dose regulations. The NRC recommends licensing the proposed mill subject to stipulated license conditions
Research on the prevention of mine accident

Energy Technology Data Exchange (ETDEWEB)

Cho, Won Jai; Kang, Chang Hee; Lee, Sang Kwon; Lee, Jong Lim; Kim, Chung Han; Hong, Sung Gyu [Korea Inst. of Geology Mining and Materials, Taejon (Korea, Republic of)

1995-12-01

This research is for providing appropriate measures on mine safety and long term development base of the operating mines by over whole safety inspections. In this first project year, Jongam mine owned by Samtan Co. Ltd. and Hwasun mine of Daihan Coal Corporation were target for this research. Major issue of Jongam mine was revealed that lack of pumping capacity to treat ever increasing underground water which is mainly due to the inflow from the adjacent closed mines, and insufficient investment for the preparation of long term program. In case of Hwasun mine, the major problems are the surface subsidence and water inflow caused by extraction of large scale pocket type ore body. Besides, in most cases, the morale of mine workers and business mind of owners are so depressed that the mine safety is going to be vulnerable anyhow. In this point of view, the regulatory and systematic measures to encourage the workers` morale and owners` investment mind are urgently requested. However, investigation result of underground electrical hazard showed that there is no remarkable problems. The average efficiency of pumps revealed 50% which is considered rather good condition yet, and no coal seams were found which bears excessive carbon dioxide gas. (author). 21 refs., 40 figs., 81 tabs.
Legal and regulatory issues affecting compressed air energy storage

Energy Technology Data Exchange (ETDEWEB)

Hendrickson, P.L.

1981-07-01

Several regulatory and legal issues that can potentially affect implementation of a compressed air energy storage (CAES) system are discussed. This technology involves the compression of air using base load electric power for storage in an underground storage medium. The air is subsequently released and allowed to pass through a turbine to generate electricity during periods of peak demand. The storage media considered most feasible are a mined hard rock cavern, a solution-mined cavern in a salt deposit, and a porous geologic formation (normally an aquifer) of suitable structure. The issues are discussed in four categories: regulatory issues common to most CAES facilities regardless of storage medium, regulatory issues applicable to particular CAES reservoir media, issues related to possible liability from CAES operations, and issues related to acquisition of appropriate property rights for CAES implementation. The focus is on selected federal regulation. Lesser attention is given to state and local regulation. (WHK)
Exploratory analysis of textual data from the Mother and Child Handbook using the text-mining method: Relationships with maternal traits and post-partum depression.

Science.gov (United States)

Matsuda, Yoshio; Manaka, Tomoko; Kobayashi, Makiko; Sato, Shuhei; Ohwada, Michitaka

2016-06-01

The aim of the present study was to examine the possibility of screening apprehensive pregnant women and mothers at risk for post-partum depression from an analysis of the textual data in the Mother and Child Handbook by using the text-mining method. Uncomplicated pregnant women (n = 58) were divided into two groups according to State-Trait Anxiety Inventory grade (high trait [group I, n = 21] and low trait [group II, n = 37]) or Edinburgh Postnatal Depression Scale score (high score [group III, n = 15] and low score [group IV, n = 43]). An exploratory analysis of the textual data from the Maternal and Child Handbook was conducted using the text-mining method with the Word Miner software program. A comparison of the 'structure elements' was made between the two groups. The number of structure elements extracted by separated words from text data was 20 004 and the number of structure elements with a threshold of 2 or more as an initial value was 1168. Fifteen key words related to maternal anxiety, and six key words related to post-partum depression were extracted. The text-mining method is useful for the exploratory analysis of textual data obtained from pregnant woman, and this screening method has been suggested to be useful for apprehensive pregnant women and mothers at risk for post-partum depression. © 2016 Japan Society of Obstetrics and Gynecology.

The Effects of Self-Regulatory Learning through Computer-Assisted Intelligent Tutoring System on the Improvement of EFL Learners' Speaking Ability

Science.gov (United States)

Mohammadzadeh, Ahmad; Sarkhosh, Mehdi

2018-01-01

The current study attempted to investigate the effects of self-regulatory learning through computer-assisted intelligent tutoring system on the improvement of speaking ability. The participants of the study, who spoke Azeri Turkish as their mother tongue, were students of Applied Linguistics at BA level at Pars Abad's Azad University, Ardebil,…
Thiosulphate assisted phytoextraction of mercury contaminated soils at the Wanshan Mercury Mining District, Southwest China

Directory of Open Access Journals (Sweden)

J. Wang

2013-10-01

Full Text Available Wanshan, known as the “Mercury Capital” of China, is located in the Southwest of China. Due to the extensive mining and smelting works in the Wanshan area, the local ecosystem has been serious contaminated with mercury. In the present study, a number of soil samples were taken from the Wanshan mercury mining area and the mercury fractionations in soils were analyzed using sequential extraction procedure technique. The obtained results showed that the dominate mercury fractions (represent 95% of total mercury were residual and organic bound mercury. A field trial was conducted in a mercury polluted farmland at the Wanshan mercury mine. Four plant species Brassica juncea Czern. et Coss.var. ASKYC (ASKYC, Brassica juncea Czern. et Coss.var.DPDH (DPDH, Brassica juncea Czern. et Coss.var.CHBD(CHBD, Brassica juncea Czern. et Coss.var.LDZY (LDZY were tested their ability to extract mercury from soil with thiosulphate amendment. The results indicated that the mercury concentration in the roots and shoots of the four plants were significantly increased with thiosulphate treatment. The mercury phytoextraction yield of ASKYC, DPDH, CHBD and LDZY were 92, 526, 294 and 129 g/ha, respectively
Thiosulphate assisted phytoextraction of mercury contaminated soils at the Wanshan Mercury Mining District, Southwest China

Directory of Open Access Journals (Sweden)

J Wang

2013-10-01

Full Text Available Wanshan, known as the “Mercury Capital” of China, is located in the Southwest of China. Due to the extensive mining and smelting works in the Wanshan area, the local ecosystem has been serious contaminated with mercury. In the present study, a number of soil samples were taken from the Wanshan mercury mining area and the mercury fractionations in soils were analyzed using sequential extraction procedure technique. The obtained results showed that the dominate mercury fractions (represent 95% of total mercury were residual and organic bound mercury. A field trial was conducted in a mercury polluted farmland at the Wanshan mercury mine. Four plant species Brassica juncea Czern. et Coss.var. ASKYC (ASKYC, Brassica juncea Czern. et Coss.var.DPDH (DPDH, Brassica juncea Czern. et Coss.var.CHBD(CHBD, Brassica juncea Czern. et Coss.var.LDZY (LDZY were tested their ability to extract mercury from soil with thiosulphate amendment. The results indicated that the mercury concentration in the roots and shoots of the four plants were significantly increased with thiosulphate treatment. The mercury phytoextraction yield of ASKYC, DPDH, CHBD and LDZY were 92, 526, 294 and 129 g/ha, respectively.
Aligning environmental and regulatory procedures with a holistic project management approach for residue deposits

CSIR Research Space (South Africa)

Snyman, BJ

2006-01-01

Full Text Available ). 1.1 Legislative responses in South Africa The advent of a democratic South Africa has brought about the promulgation of new environmental legislation. This was necessary to give legal effect to the principles of sustainability as laid down.... Often project delays can be ascribed to non-conformance to the legal and regulatory processes, because: • The South African EIA legislative process that is applicable to mine residue deposition does not focus on project management procedures, project...
Radiation protection programme for uranium mining

International Nuclear Information System (INIS)

Mbeye, M.J.

2014-04-01

The Radiation Protection Programme (RPP) was developed to ensure that measures are in place for the occupational protection and safety in uranium mining facility. This work has established a number of protective measures that should be taken by the individual miners, licensee and all staff. It is not known whether Kayerekera Uranium mine has the technical and administrative capability for an effective radiation protection programme. The key in the mining facility is the control of dust through various means to prevent the escape of radon gas. Personal hygiene and local operating rules have been discovered to be very important for the protection and safety of the workers. The following components have also been discovered to be vital in ensuring safety culture in the mining facility: classification of working areas, monitoring of individuals and workplace, assignment of responsibilities, emergency preparedness, education and training and health surveillance. The regulatory body (Environmental Affairs Department of Malawi) should examine the major areas outlined in the RPP for Kayerekera uranium mine to find out the effectiveness of the RPP that is in place. (au)
Ensuring the Environmental and Industrial Safety in Solid Mineral Deposit Surface Mining

Science.gov (United States)

Trubetskoy, Kliment; Rylnikova, Marina; Esina, Ekaterina

2017-11-01

The growing environmental pressure of mineral deposit surface mining and severization of industrial safety requirements dictate the necessity of refining the regulatory framework governing safe and efficient development of underground resources. The applicable regulatory documentation governing the procedure of ore open-pit wall and bench stability design for the stage of pit reaching its final boundary was issued several decades ago. Over recent decades, mining and geomechanical conditions have changed significantly in surface mining operations, numerous new software packages and computer developments have appeared, opportunities of experimental methods of source data collection and processing, grounding of the permissible parameters of open pit walls have changed dramatically, and, thus, methods of risk assessment have been perfected [10-13]. IPKON RAS, with the support of the Federal Service for Environmental Supervision, assumed the role of the initiator of the project for the development of Federal norms and regulations of industrial safety "Rules for ensuring the stability of walls and benches of open pits, open-cast mines and spoil banks", which contribute to the improvement of economic efficiency and safety of mineral deposit surface mining and enhancement of the competitiveness of Russian mines at the international level that is very important in the current situation.
Competent authority regulatory control of the transport of radioactive material

International Nuclear Information System (INIS)

1987-04-01

The purpose of this guide is to assist competent authorities in regulating the transport of radioactive materials and to assist users of transport regulations in their interactions with competent authorities. The guide should assist specifically those countries which are establishing their regulatory framework and further assist countries with established procedures to harmonize their application and implementation of the IAEA Regulations. This guide specifically covers various aspects of the competent authority implementation of the IAEA Regulations for the Safe Transport of Radioactive Material. In addition, physical protection and safeguards control of the transport of nuclear materials as well as third party liability aspects are briefly discussed. This is because they have to be taken into account in overall transport regulatory activities, especially when establishing the regulatory framework
Specialized mining GIS system MineGIS SMZ Jelšava

Directory of Open Access Journals (Sweden)

Peter Sasvári

2005-12-01

Full Text Available Following, the real needs for new mining information system requested by SMZ Jelšava, the Department of Mineral Deposits and Applied Geology (KLaAG at the Technical University of Košice (TUKE has prepared a specification for the specialized mining geographic information system called MineGIS SMZ Jelšava. The main roles of the new system have been defined as follows of reserves: the administration, analyse and the visualization of all mining geo-data related to the estimation.
Operation and monitoring guidelines and the development of a screening tool for irrigating with coal mine water in Mpumalanga Province, South Africa

Energy Technology Data Exchange (ETDEWEB)

Vermeulen, D.; Usher, B. [University of Free State, Bloemfontein (South Africa). Institute of Groundwater Studies

2009-07-15

It is predicted that vast volumes of impacted mine water will be produced by mining activities in the Mpumalanga coalfields of South Africa. The potential environmental impact of this excess water is of great concern in a water-scarce country like South Africa. Detailed research has been undertaken over the past number of years onl both undisturbed soils and in coal-mining spoils. These sites range from sandy soils to very clayey soils. The results indicate that many of the soils have considerable attenuation capacities and that over the period of irrigation, a large proportion of the salts are contained in the upper portions of the unsaturated zones below each irrigation pivot. The volumes and quality of water leaching through to the aquifers have been quantified at each site. From these data mixing ratios were calculated in order to determine the effect of the irrigation water on the underlying aquifers. One of the outcomes from this study was to define the conditions under which mine-water irrigation can be implemented and the associated operational and monitoring guidelines that should be followed. These have been based on the findings from this study, the fundamental considerations of mine-water irrigation, the regulatory environment and, as far as possible, the practical implementation of mine-water irrigation as part of optimal mine-water management. In an attempt to standardise decision-making regarding mine-water irrigation, the criteria, data, rules and fundamentals discussed have been combined in a user-friendly tool, called GIMI (Groundwater Impacts from Minewater Irrigation). This tool should assist in the practical implementation of mine-water irrigation as part of optimal mine-water management.
Report on the Uranium Mine Radiation Safety Course

International Nuclear Information System (INIS)

1987-06-01

Since 1981 the Canadian Institute for Radiation Safety (CAIRS) has administered a semi-annual course on radiation safety in uranium mines under contract to and in consultation with the Atomic Energy Control Board (AECB). The course is intended primarily for representatives from mining companies, regulatory agencies, unions, and mine and mill workers. By the terms of its contract with the AECB, CAIRS is required to submit a report on each course it conducts. This is the report on the June 1987 course. It lists the course objectives and the timetable, outlines for each lecture, the lecturers' resumes, and the participants. The students' evaluations of the course are included
Management councils and regulation: public assistance in times of transition

Directory of Open Access Journals (Sweden)

Carla Cecília Rodrigues Almeida

2009-10-01

Full Text Available This article analyzes the role of municipal public assistance councils in the regulation of civil society organizations classified as social welfare organizations. We draw attention to the importance that these councils have for the operationalization of the more general principles governing the country’s social welfare/public assistance policies. Focusing on the regulatory role that management councils’ take on, we seek to understand, on the one hand, the wider institutional environment to which they belong and, on the other, their power to remodel a particular type of associativism which until very recently was identified and identified itself as philanthropical. Keywords: municipal councils, social welfare, Sistema Único de Assistência Social, regulation, public assistance entities.
Acid mine drainage as an important mechanism of natural radiation enhancement in mining areas

International Nuclear Information System (INIS)

Fernandes, H.M.; Franklin, M.R.

2002-01-01

Acid mine drainage (AMD) is a world wide problem that occurs whenever sulfidic material is present in association to the mined ore. The acidic waters generated by the process of sulfide minerals oxidation can mobilize important amounts of pollutants and cause significant environmental impacts. The composition of the drainage will depend, on a very large extent, on the mineralogy of the rocks. The purpose of this paper is to demonstrate that acid mine drainage has the potential to enhance the natural levels of environmental radioactivity. The paper revises some strategies to be used in the diagnostic of the problem. General mathematical formulations that can assist on the prediction of the duration of the problem, and the definition of the size of the oxidizing zones in a waste dump are given. A study case on a waste dump of the Pocos de Caldas Uranium Mining Site, Brazil is also presented. (author)
Hydrogeology and hydrochemistry of the Midnite Mine, northeastern Washington

International Nuclear Information System (INIS)

Marcy, A.D.; Scheibner, B.J.; Toews, K.L.; Boldt, C.M.K.

1994-01-01

The Midnite Mine is an inactive, hard-rock uranium mine in Stevens County, WA. Oxidation of sulfide-containing minerals, primarily pyrite, in the ore body produces large quantities of acidic water. An interception system installed by the mining company limits the discharge of contaminated water from the mine. The US Bureau of Indian Affairs (BIA) and the US Bureau of Land Management (BLM) have been actively involved in planning remediation of the disturbed areas. To assist in remediation, the US Bureau of Mines (USBM) initiated research to determine water quality and to define ground water flow characteristics. USBM personnel designed a monitoring network, supervised installation of sampling wells, and collected and analyzed water samples. This Report of Investigations described interpretation of data collected between December 1989 and April 1992. The computer program WATEQ4F was used to identify aqueous species distribution and to calculate potential solid phase controls of solubility. To assist in interpretation of changes in water quality between sampling locations and to develop models describing proposed flow paths, the computer program BALANCE was used. Using output from these programs and field observations, a description of the chemistry along proposed ground water flow paths at the mine is presented
Automated Text Data Mining Analysis of Five Decades of Educational Leadership Research Literature: Probabilistic Topic Modeling of "EAQ" Articles From 1965 to 2014

Science.gov (United States)

Wang, Yinying; Bowers, Alex J.; Fikis, David J.

2017-01-01

Purpose: The purpose of this study is to describe the underlying topics and the topic evolution in the 50-year history of educational leadership research literature. Method: We used automated text data mining with probabilistic latent topic models to examine the full text of the entire publication history of all 1,539 articles published in…
Risk-based Regulatory Evaluation Program methodology

International Nuclear Information System (INIS)

DuCharme, A.R.; Sanders, G.A.; Carlson, D.D.; Asselin, S.V.

1987-01-01

The objectives of this DOE-supported Regulatory Evaluation Progrwam are to analyze and evaluate the safety importance and economic significance of existing regulatory guidance in order to assist in the improvement of the regulatory process for current generation and future design reactors. A risk-based cost-benefit methodology was developed to evaluate the safety benefit and cost of specific regulations or Standard Review Plan sections. Risk-based methods can be used in lieu of or in combination with deterministic methods in developing regulatory requirements and reaching regulatory decisions
Regulatory RNA-assisted genome engineering in microorganisms.

Science.gov (United States)

Si, Tong; HamediRad, Mohammad; Zhao, Huimin

2015-12-01

Regulatory RNAs are increasingly recognized and utilized as key modulators of gene expression in diverse organisms. Thanks to their modular and programmable nature, trans-acting regulatory RNAs are especially attractive in genome-scale applications. Here we discuss the recent examples in microbial genome engineering implementing various trans-acting RNA platforms, including sRNA, RNAi, asRNA and CRISRP-Cas. In particular, we focus on how the scalable and multiplex nature of trans-acting RNAs has been used to tackle the challenges in creating genome-wide and combinatorial diversity for functional genomics and metabolic engineering applications. Advances in computational design and context-dependent regulation are also discussed for their contribution in improving fine-tuning capabilities of trans-acting RNAs. Copyright © 2015 Elsevier Ltd. All rights reserved.
The Cogemagazine reviews. The rehabilitation of mining sites in France; Les cahiers de Cogemagazine. Le reamenagement des sites miniers en France

Energy Technology Data Exchange (ETDEWEB)

Loriot, O.; Bof, M.; Villeneuve, A

1998-02-01

The French uranium mines are progressively closing down. After a mining division has closed down, the main objectives of the Cogema group are: ensuring the long-term safety and healthiness of the site, reducing the residual impacts, preventing any abusive intrusion, reducing the surface of land submitted to right-of-way, encouraging the reconversion of the site, and succeeding in the integration of the site in the landscape in agreement with the local authorities. This brochure presents the strategy followed by Cogema for the rehabilitation of his sites: the French mining concessions and the uranium extraction and processing techniques, the storage of tailings and processing residues, the environment protection and the respect of regulation (environmental surveillance, working groups, administrative procedures and regulatory texts, impact studies...), the backfilling and safety of underground mines, the cost studies for the rehabilitation of open cast mines, the dismantling of factories, the confinement of residues and the revegetation, the continuous monitoring of the rehabilitated sites (water, atmosphere, food..). (J.S.)
Mining the Text: 34 Text Features that Can Ease or Obstruct Text Comprehension and Use

Science.gov (United States)

White, Sheida

2012-01-01

This article presents 34 characteristics of texts and tasks ("text features") that can make continuous (prose), noncontinuous (document), and quantitative texts easier or more difficult for adolescents and adults to comprehend and use. The text features were identified by examining the assessment tasks and associated texts in the national…
Novel mining methods

CSIR Research Space (South Africa)

Monchusi, B

2012-10-01

Full Text Available stream_source_info Monchusi_2012.pdf.txt stream_content_type text/plain stream_size 1953 Content-Encoding ISO-8859-1 stream_name Monchusi_2012.pdf.txt Content-Type text/plain; charset=ISO-8859-1 Novel Mining Methods 4th... 2012 Slide 12 CSIR mine safety platform AR Drone Differential time-of-flight beacon Sampling ? CSIR 2012 Slide 13 Reef Laser-Induced Breakdown Spectroscopy (LIBS) head Scan X-Y Laser/Spectrometer/Computer Rock Breaking ? CSIR 2012 Slide...
Homelessness in Queensland mining communities: A down payment on Australia’s wealth or inevitable product of a neo-liberalist society’s response to the cyclical fortunes of mining

Directory of Open Access Journals (Sweden)

Shane Matthew Warren

2015-09-01

Full Text Available The mining boom in Australia of the first decade of the Twenty-First Century yielded prosperity for many Australians living in rural, regional and urban locations. This sense of prosperity was grounded in the widely reported experiences of people usually employed directly through the mining industry, or related industry, on high incomes, able to afford regular overseas holidays, ownership of multiple properties, material possessions and other hallmarks of an affluent lifestyle. However, less attention was given to vulnerable and homeless Australians in mining communities who did not benefit at all during the mining boom. In fact what evidence does exist indicates their disadvantage was further compounded through the high cost of housing. It is now widely accepted that the mining industry has been in a state of downturn over the last three years and this has served to highlight the social issues facing mining communities now and into the future. What is to be learnt from the decade long mining boom? Specifically this paper critiques the evidence, research literature and theories about urban-centric homelessness and assesses their relevance to homelessness in mining communities. This paper argues that the dynamics of homelessness in mining communities challenge existing homelessness theory and knowledge and argues that further evidence is needed to properly understand structural causes of homelessness in mining communities and to guide policy responses that may help prevent homelessness or otherwise assist homeless people access housing and support services. Identifying the mining boom and mining downturn cycle will be explored. Finally this paper outlines the case for further research to improve policy and planning responses to address homelessness in these communities taking into account planning requirements to address the mining boom and down turn cycle.

A summary of fish and wildlife information needs to surface mine coal in the United States. Part 3. A handbook for meeting fish and wildlife information needs to surface mine coal: OSM Region III. Final report

Energy Technology Data Exchange (ETDEWEB)

Hinkle, C.R.; Ambrose, R.E.; Wenzel, C.R.

1981-02-01

The report contains information to assist in protecting, enhancing, and reducing impacts to fish and wildlife resources during surface mining of coal. It gives information on the premining, mining, reclamation and compliance phases of surface mining. Methods and sources to obtain information to satisfy state and Federal regulations are presented. Considerable emphasis is placed on postmining assistance. This volume is specifically for the states of Minnesota, Wisconsin, Michigan, Illinois, Indiana and Ohio.
77 FR 25868 - Iowa Regulatory Program

Science.gov (United States)

2012-05-02

... reference of applicable portions of 30 CFR part 700 to End from the July 1, 2002, version to the July 1, 2010, version. Additionally, Iowa proposed to revise its Program related to ownership and control by... the Iowa regulatory program (Iowa program) under the Surface Mining Control and Reclamation Act of...
Perils of project development on public land open to mining

International Nuclear Information System (INIS)

Jacobs, W.R.

1991-01-01

Conducting a government project on public land open to the general mining laws can result in added costs, legal entanglements, schedule uncertainties, and the potential for unanticipated safety issues and concerns due to interactions with mining claimants. Planning for such projects must include a careful assessment of not only land access needs and restrictions, but also possible scenarios for conflict with activities authorized under the general mining laws throughout the life of the project. It is essential to have a thorough knowledge of the applicable mining laws and how they are currently being interpreted and applied by the responsible regulatory authorities and land managers. The Yucca Mountain Project approach to land access, problems encountered with mining claims filed under the Mining Law of 1872, and the lessons learned from these experiences are discussed in this paper
Environmental Development Plan: uranium mining, milling, and conversion

International Nuclear Information System (INIS)

1979-08-01

This Environmental Development Plan (EDP) identifies the planning and management requirements and schedules needed to evaluate and assess the environmental, health, and safety (EH and S) aspects of the uranium mining, milling, and conversion technologies. The plan represents the collective perceptions of EH and S concerns and requirements and knowledge of ongoing research programs of most of the Federal agencies involved in significant EH and S R and D program management, standards setting, or regulatory activities associated with uranium mining, milling, and conversion
Mining waste contaminated lands: an uphill battle for improving crop productivity

Directory of Open Access Journals (Sweden)

B M Kumar

2013-10-01

Full Text Available Mining drastically alters the physico-chemical and biological environment of the landscape. Low organic matter content, unfavourable pH, low water holding capacity, salinity, coarse texture, compaction, siltation of water bodies due to wash off of mineral overburden dumps, inadequate supply of plant nutrients, accelerated erosion, acid generating materials, and mobilization of contaminated sediments into the aquatic environment are the principal constraints experienced in mining contaminated sites. A variety of approaches have been considered for reclaiming mine wastes including direct revegetation of amended waste materials, top soiling, and the use of capillary barriers. The simplest technology to improve crop productivity is the addition of organic amendments. Biosolids and animal manure can support revegetation, but its rapid decomposition especially in the wet tropics, necessitates repeated applications. Recalcitrant materials such as “biochars”, which improve soil properties on a long term basis as well as promote soil carbon sequestration, hold enormous promise. An eco-friendly and cost-effective Microbe Assisted Phytoremediation system has been proposed to increase biological productivity and fertility of mine spoil dumps. Agroforestry practices may enhance the nutrient status of degraded mine spoil lands (facilitation. N-fixing trees are important in this respect. Metal tolerant ecotypes of grasses and calcium-loving plants help restore lead, zinc, and copper mine tailings and gypsum mine spoils, respectively. Overall, an integrated strategy of introduction of metal tolerant plants, genetic engineering for enhanced synthesis and exudation of natural chelators into the rhizosphere, improvement of rhizosphere, and integrated management including agroforestry will be appropriate for reclaiming mining contaminated lands.
Uranium mining and production: A legal perspective on regulating an important resource

International Nuclear Information System (INIS)

Thiele, Lisa

2013-01-01

The importance of uranium can be examined from several perspectives. First, natural uranium is a strategic energy resource because it is a key ingredient for the generation of nuclear power and, therefore, it can affect the energy security of a state. Second, natural uranium is also a raw material in relative abundance throughout the world, which can, through certain steps, be transformed into nuclear explosive devices. Thus, there is both an interest in the trade of uranium resources and a need for their regulatory control. The importance of uranium to the worldwide civilian nuclear industry means that its extraction and processing - the so-called 'front end' of the nuclear fuel cycle - is of regulatory interest. Like 'ordinary' metal mining, which is generally regulated within a country, uranium mining must also be considered from the more particular perspective of regulation and control, as part of the international nuclear law regime that is applied to the entire nuclear fuel cycle. The present overview of the regulatory role in overseeing and controlling uranium mining and production will outline the regulation of this resource from an international level, both from early days to the present day. Uranium mining is not regulated internationally; rather, it is a state responsibility. However, developments at the international level have, over time, led to better national regulation. One can note several changes in the approach to the uranium industry since the time that uranium was first mined on a significant scale, so that today the mining and trade of uranium is a well-established and regulated industry much less marked by secrecy and Cold War sentiment. At the same time, it is informed by international standards and conventions, proliferation concerns and a modern regard for environmental protection and the health and safety of workers and the public. (author)
The Asian Development Model and Mining Reforms in Indonesia

Directory of Open Access Journals (Sweden)

John McLaren

2015-07-01

Full Text Available The aim of this paper is to provide accounting, marketing, management, finance and legal professionals who are engaged with emerging economies with an introduction to the ‘Asian Development Model’ and to use the mining reforms in Indonesia as an example of the Model in operation. This will assist those professionals in recognising the challenges faced by businesses in Australia and New Zealand when governments in South East Asian countries attempt to ‘catch up’ to the developed world and at the same time attempt to spread the benefits of the development to their people. The paper argues that there is an Asian Development Model and that the Indonesian mining reforms, in particular the requirement over time for 51 percent Indonesian Government ownership and the ban on the export of unprocessed resources, represent an attempt by the Indonesian State to speed up industrialisation in their country and to spread more of the benefits from mining to ordinary citizens in the recently democratised and politically decentralised country. In attempting to show strength however, the Indonesian state is exposing some weakness. The impact on jobs, revenue and production has been adverse although Foreign Direct Investment has increased. This latter may be because it is foreign multinational mining companies who are better placed than local mining enterprises to build smelters. The success of developing mining might be at the expense of local capital. In other words state intervention does not always produce all of the desired outcomes. It is not a panacea.
Anomaly Detection with Text Mining

Data.gov (United States)

National Aeronautics and Space Administration — Many existing complex space systems have a significant amount of historical maintenance and problem data bases that are stored in unstructured text forms. The...
Radiation protection of workers in uranium mining, ore processing and fuel fabrication in India

International Nuclear Information System (INIS)

Khan, A.H.; Jha, G.; Jha, S.; Srivastava, G.K.; Sadasivan, S.; Venkat Raj, V.

2002-01-01

Full text: Low grade of uranium ore mined from three underground mines is processed in a mill at Jaduguda in eastern India to recover uranium concentrate in the form of yellow cake. This concentrate is further processed at the Nuclear Fuel Complex at Hyderabad, in southern India, to produce fuel for use in nuclear power plants. Radiation protection of workers is given due importance at all stages of these operations. Dedicated Health Physics Units and Environmental Survey Laboratories established at each site regularly carry out in-plant and environmental surveillance to keep radiation exposure of workers and the members of public within the limits prescribed by the regulatory body. The limits set by the national regulatory body are based on the international standards suggested by the ICRP and the IAEA. In the uranium mines external gamma radiation, radon and airborne activity due to radioactive dust is monitored. Similarly, in the uranium mill and the fuel fabrication plant gamma radiation and airborne radioactivity due to long-lived a- emitters are monitored. Personal dosimeters are also issued to workers. The total radiation exposure of workers from external and internal sources is evaluated from the personal monitoring and area monitoring data. It has been observed that the total radiation dose to workers has been well below 20 mSvy -1 at all stages of operations. Adequate ventilation is provided during mining, ore processing and fuel fabrication operations to keep the concentrations of airborne radioactivity well below the derived limits. Workers use personal protective appliances, where necessary, as a supplementary means of control. The monitoring methodologies, results and control measures are presented in the paper
Implementasi Data Warehouse dan Data Mining: Studi Kasus Analisis Peminatan Studi Siswa

Directory of Open Access Journals (Sweden)

Eka Miranda

2011-06-01

Full Text Available This paper discusses the implementation of data mining and their role in helping decision-making related to students’ specialization program selection. Currently, the university uses a database to store records of transactions which can not directly be used to assist analysis and decision making. Based on these issues then made the data warehouse design used to store large amounts of data and also has the potential to gain new data distribution perspectives and allows to answer the ad hoc question as well as to perform data analysis. The method used consists of: record analysis related to students’ academic achievement, designing data warehouse and data mining. The paper’s results are in a form of data warehouse and data mining design and its implementation with the classification techniques and association rules. From these results can be seen the students’ tendency and pattern background in choosing the specialization, to help them make decisions.
Ask and Ye Shall Receive? Automated Text Mining of Michigan Capital Facility Finance Bond Election Proposals to Identify Which Topics Are Associated with Bond Passage and Voter Turnout

Science.gov (United States)

Bowers, Alex J.; Chen, Jingjing

2015-01-01

The purpose of this study is to bring together recent innovations in the research literature around school district capital facility finance, municipal bond elections, statistical models of conditional time-varying outcomes, and data mining algorithms for automated text mining of election ballot proposals to examine the factors that influence the…
Sustainability of new uranium mining projects in Argentina

International Nuclear Information System (INIS)

Navarra, P.R.

2002-01-01

The regulatory framework issued in the 1994-1995 period, connected mining activities in Argentina with international good environmental practices. Agreements between National Government and Provinces allow the application of the regulations, while Act No 24.585, the milestone about the matter, establishes the steps for the approval of the Report of Environmental Impact, on successive stages of the project. Specifically for uranium mining and milling, the assessment of the radiological protection aspects of the planned activities is assigned to the Nuclear Regulatory Authority. The National Atomic Energy Commission is at present carrying out two uranium mining projects, that involve the Sierra Pintada and Cerro Solo deposits. The goal of them is restart uranium production in the country in the medium term, by lowing the gap between indigenous and market uranium prices. The first one consists in updating the feasibility study of the, at present inactive, Sierra Pintada Production Center (Mendoza Province). Studies for improving the mining and treatment methods are performed in the project, co-ordinately with the investigation and forecast of mining waste and processing tailings management. Besides, the procedures will be determined taking into account the methodology to be applied when getting the closure stage, about the existing waste and tailings. Development of the Sierra de Pichinan District, Chubut Province (U-Mo), is the objective of the second project. It is remarkable that about Cerro Solo, the main ore deposit belonging to this area, at the prefeasibility stage, CNEA is currently encouraging private investment through a bidding process. Environmental studies are an important aspect of the activities carried out and planned in the area. As a conclusion, with regard uranium mining and milling activities in Argentina, the regulations and environmental technical-scientific knowledge are becoming friendly with the sustainable practice. (author)
Superficial drainage studies in open-pit mines

International Nuclear Information System (INIS)

Teixeira Junior, P.B.; Leite, C.B.B.

1984-01-01

Drainage studies concerning large open-pit mining projects can be of vital importance throughout the mining activity itself as they may assist in avoiding activity interruptions due to drainage problems, therefore representing substantial savings. These studies should, in fact, be carried out from the initial activity stages and shall be considered in operational, project and planning decisions in order to optimize results and reduce costs. This specific study presents a drainage study systematization proposal, enphasazing economic decision criteria. The authors comment on studies of this nature developed at the Caldas uranium mine - NUCLEBRAS. (D.J.M.) [pt
Mining and environment

International Nuclear Information System (INIS)

Pimiento, Elkin Vargas

1998-01-01

In order to obtain the best social and environmental results from mining activities, different solutions, which involve a variety of perspectives, have been proposed. These include the worldwide perspective based in the economy globalization paradigms; the regional perspective, focused in the integration of countries; the national perspective, which emphasizes the natural assets and development options, and finally a local perspective is incorporated to account for the participation of directly affected communities. Within this framework, the mining industry is requested to develop both technological and managerial tools appropriate to evaluate, optimize and communicate the social and environmental performance and output of its related activities, mainly in the developing countries. On the other hand, the governments have been committed to implement regulatory actions, of command and control type, based on an environmental legislation in line with the above mentioned perspectives and also to use economical instruments as a mean to accomplish environmental objectives. In Colombia the direct regulation methods have been traditionally used to prevent the environmental deterioration produced by mining activities, however, since the 1991 political constitution and the law 99 of 1993, the communities' participation and economical instruments were incorporated. A historic summary of the environmental legislation in our country from the early 70's up to now, showing its implications in mining is presented. Then a favorable tendency is indicated in the environmental improvement of the national extractive industry, accomplished as a result of the implementation of new strategies to minimize the impact of mining on the environment and to improve the well being of local communities
Mining collections of compounds with Screening Assistant 2

Directory of Open Access Journals (Sweden)

Guilloux Vincent

2012-08-01

Full Text Available Abstract Background High-throughput screening assays have become the starting point of many drug discovery programs for large pharmaceutical companies as well as academic organisations. Despite the increasing throughput of screening technologies, the almost infinite chemical space remains out of reach, calling for tools dedicated to the analysis and selection of the compound collections intended to be screened. Results We present Screening Assistant 2 (SA2, an open-source JAVA software dedicated to the storage and analysis of small to very large chemical libraries. SA2 stores unique molecules in a MySQL database, and encapsulates several chemoinformatics methods, among which: providers management, interactive visualisation, scaffold analysis, diverse subset creation, descriptors calculation, sub-structure / SMART search, similarity search and filtering. We illustrate the use of SA2 by analysing the composition of a database of 15 million compounds collected from 73 providers, in terms of scaffolds, frameworks, and undesired properties as defined by recently proposed HTS SMARTS filters. We also show how the software can be used to create diverse libraries based on existing ones. Conclusions Screening Assistant 2 is a user-friendly, open-source software that can be used to manage collections of compounds and perform simple to advanced chemoinformatics analyses. Its modular design and growing documentation facilitate the addition of new functionalities, calling for contributions from the community. The software can be downloaded at http://sa2.sourceforge.net/.
Research trends on Big Data in Marketing: A text mining and topic modeling based literature analysis

Directory of Open Access Journals (Sweden)

Alexandra Amado

2018-01-01

Full Text Available Given the research interest on Big Data in Marketing, we present a research literature analysis based on a text mining semi-automated approach with the goal of identifying the main trends in this domain. In particular, the analysis focuses on relevant terms and topics related with five dimensions: Big Data, Marketing, Geographic location of authors’ affiliation (countries and continents, Products, and Sectors. A total of 1560 articles published from 2010 to 2015 were scrutinized. The findings revealed that research is bipartite between technological and research domains, with Big Data publications not clearly aligning cutting edge techniques toward Marketing benefits. Also, few inter-continental co-authored publications were found. Moreover, findings show that research in Big Data applications to Marketing is still in an embryonic stage, thus making it essential to develop more direct efforts toward business for Big Data to thrive in the Marketing arena.
Characterization of particulate emissions from Australian open-cut coal mines: Toward improved emission estimates.

Science.gov (United States)

Richardson, Claire; Rutherford, Shannon; Agranovski, Igor

2018-06-01

Given the significance of mining as a source of particulates, accurate characterization of emissions is important for the development of appropriate emission estimation techniques for use in modeling predictions and to inform regulatory decisions. The currently available emission estimation methods for Australian open-cut coal mines relate primarily to total suspended particulates and PM 10 (particulate matter with an aerodynamic diameter available relating to the PM 2.5 (currently available emission estimation techniques, this paper presents results of sampling completed at three open-cut coal mines in Australia. The monitoring data demonstrate that the particulate size fraction varies for different mining activities, and that the region in which the mine is located influences the characteristics of the particulates emitted to the atmosphere. The proportion of fine particulates in the sample increased with distance from the source, with the coarse fraction being a more significant proportion of total suspended particulates close to the source of emissions. In terms of particulate composition, the results demonstrate that the particulate emissions are predominantly sourced from naturally occurring geological material, and coal comprises less than 13% of the overall emissions. The size fractionation exhibited by the sampling data sets is similar to that adopted in current Australian emission estimation methods but differs from the size fractionation presented in the U.S. Environmental Protection Agency methodology. Development of region-specific emission estimation techniques for PM 10 and PM 2.5 from open-cut coal mines is necessary to allow accurate prediction of particulate emissions to inform regulatory decisions and for use in modeling predictions. Development of region-specific emission estimation techniques for PM 10 and PM 2.5 from open-cut coal mines is necessary to allow accurate prediction of particulate emissions to inform regulatory decisions and for
PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.

Science.gov (United States)

Döring, Kersten; Grüning, Björn A; Telukunta, Kiran K; Thomas, Philippe; Günther, Stefan

2016-01-01

Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools. Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files. The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user's system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering/PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects.
Mission creep or responding to wider security needs? The evolving role of mine action organisations in Armed Violence Reduction

Directory of Open Access Journals (Sweden)

Sharmala Naidoo

2013-04-01

Full Text Available Since the late 1980s, mine action organisations have focused their efforts on reducing the social, economic and environmental impacts of anti-personnel mines and other explosive remnants of war (ERW through a broad range of activities, including survey, clearance, mine risk education (MRE, victim assistance, stockpile destruction and advocacy. In recent years, an increasing number of mine action organisations are using their mine action technical expertise and their capacities to operate in difficult environments to reduce armed violence and promote public safety. Several organisations now have armed violence reduction (AVR-related policies, programmes and staff in place. Some may argue that this shift towards AVR is a diversion from the core mandate of mine action organisations. But does this represent a loss of focus and thereby ‘mission creep’ on the part of these organisations? This practice note examines the factors underlying the evolving role of mine action organisations, discusses how these new programmes are contributing to the wider domain of AVR and explores whether these new programmes have resulted in a loss of organisational focus.
Surface mining and land reclamation in Germany

Energy Technology Data Exchange (ETDEWEB)

Nephew, E.A.

1972-05-01

Mining and land restoration methods as well as planning and regulatory procedures employed in West Germany to ameliorate environmental impacts from large-scale surface mining are described. The Rhineland coalfield in North Rhine Westphalia contains some 55 billion tons of brown-coal (or lignite), making the region one of Europe's most important energy centers. The lignite is extracted from huge, open-pit mines, resulting in large areas of disturbed land. The German reclamation approach is characterized by planning and carrying out the mining process as one continuum from early planning to final restoration of land and its succeeding use. Since the coalfield is located in a populated region with settlements dating back to Roman times, whole villages lying in the path of the mining operations sometimes have to be evacuated and relocated. Even before mining begins, detailed concepts must be worked out for the new landscape which will follow: the topography, the water drainage system, lakes and forests, and the intended land-use pattern are designed and specified in advance. Early, detailed planning makes it possible to coordinate mining and concurrent land reclamation activities. The comprehensive approach permits treating the overall problem as a whole rather than dealing with its separate aspects on a piecemeal basis.

The BioLexicon: a large-scale terminological resource for biomedical text mining

Directory of Open Access Journals (Sweden)

Thompson Paul

2011-10-01

Full Text Available Abstract Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is
Environmental problems relating to uranium mining and milling

International Nuclear Information System (INIS)

Friedman, F.B.

1979-01-01

The regulations of the mining and milling of uranium as they relate to the environment are discussed. The industry is primarily under the jurisdiction of the federal government and administered by the Nuclear Regulatory Commission (NRC). This authority can in some instances be relegated to the states. Certain areas of jurisdiction have been given over to Environmental Protection Agency (EPA) by the courts. The Safe Drinking Water Act is discussed as it relates to in situ leach mining. The role of the Department of Interior in the regulating of uranium mining, as described in the Federal Land Policy Management Act of 1976, is discussed. The requirement for environmental impact statements prior to licensing by the NRC or the individual states is also discussed. Air quality and radioactive waste disposal as they relate to uranium mining are also discussed
ANALYSIS OF WEB MINING APPLICATIONS AND BENEFICIAL AREAS

Directory of Open Access Journals (Sweden)

Khaleel Ahmad

2011-10-01

Full Text Available The main purpose of this paper is to study the process of Web mining techniques, features, application ( e-commerce and e-business and its beneficial areas. Web mining has become more popular and its widely used in varies application areas (such as business intelligent system, e-commerce and e-business. The e-commerce or e-business results are bettered by the application of the mining techniques such as data mining and text mining, among all the mining techniques web mining is better.
Methodology of reducing rock bump hazard during room and rillar mining of North Ural deep bauxite deposits

Directory of Open Access Journals (Sweden)

Д. В. Сидоров

2017-03-01

Full Text Available The article describes practical experience of using room and pillar mining (RAPM under conditions of deep horizons and dynamic overburden pressure. It was identified that methods of rock pressure control efficient at high horizons do not meet safety requirements when working at existing depths, that is explained by changes in geodynamic processes during mining. With deeper depth, the geodynamic processes become more intensive and number of pillar and roof failures increase. When working at 800 m the breakage of mine structures became massive and unpredictable, which paused a question of development and implementation of tools for compliance assessment of used elements of RAPM and mining, geological, technical and geodynamic conditions of North Ural bauxite deposits and further development of guidelines for safe mining under conditions of deep horizons and dynamic rock pressure.It describes reasons of mine structure failures in workings depending on natural and man-caused factors, determines possible hazards and objects of geomechanic support. It also includes compliance assessment of tools used for calculations of RAPM structures, forecast and measures for rock tectonic bursts at mines of OAO “Sevuralboksitruda” (SUBR. It describes modernization and development of new geomechanic support of RAPM considering natural and technogenic hazards. The article presents results of experimental testing of new parameters of RAPM construction elements of SUBR mines. It has data on industrial implementation of developed regulatory and guideline documents at these mines for identification of valid parameters of RAPM elements at deep depths.
30 CFR 795.1 - Scope and purpose.

Science.gov (United States)

2010-07-01

... Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR SMALL OPERATOR ASSISTANCE PERMANENT REGULATORY PROGRAM-SMALL OPERATOR ASSISTANCE PROGRAM § 795.1 Scope and purpose. This part comprises the Small Operator Assistance Program (SOAP) and establishes the procedures for...
30 CFR 795.4 - Information collection.

Science.gov (United States)

2010-07-01

... Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR SMALL OPERATOR ASSISTANCE PERMANENT REGULATORY PROGRAM-SMALL OPERATOR ASSISTANCE PROGRAM § 795.4 Information... will be used to determine if the applicants meet the requirements of the Small Operator Assistance...
30 CFR 795.3 - Definitions.

Science.gov (United States)

2010-07-01

... Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR SMALL OPERATOR ASSISTANCE PERMANENT REGULATORY PROGRAM-SMALL OPERATOR ASSISTANCE PROGRAM § 795.3 Definitions. As used in... who has the authority and responsibility for overall management of the Small Operator Assistance...
Social Licensing in uranium mining: Experiences from the IAEA review of the planned Mukju River Uranium Project, Tanzania

International Nuclear Information System (INIS)

Schnell, H.; Hilton, J.; Saint-Pierre, S.; Baldry, K.; Fan, Z.; Tulsidas, H.

2014-01-01

The IAEA Uranium Production Site Appraisal Team (UPSAT) programme is designed to assist Member States to enhance the operational performance and the occupational, public and environmental health and safety of uranium mining and processing facilities across all phases of the uranium production cycle. The scope of the appraisal process includes exploration, resource assessment, planning, environmental and social impact assessment, mining, processing, waste management, site management, remediation, and final closure. An UPSAT review was requested in 2010 by the United Republic of Tanzania (URT) to address the challenges the country is currently facing in developing its uranium mining and processing capability for the first time. The review that was carried out from 27 May to 5 June, 2013 had the objective to to appraise URT’s preparedness for overseeing the Uranium Production Cycle in general, at the same time focusing on the planned Mkuju River Project (MRP) in the south of the country in particular. The UPSAT team was tasked to report its findings according to five primary areas: 1. Regulatory system; 2. Sustainable uranium production life cycle; 3. Health, Safety and Environment (HSE); 4. Social licensing; 5. Capacity building. The paper will discuss the key findings and suggestions that were provided to governmental stakeholders and the operater to improve the planned operations. (author)
Text Clustering Algorithm Based on Random Cluster Core

Directory of Open Access Journals (Sweden)

Huang Long-Jun

2016-01-01

Full Text Available Nowadays clustering has become a popular text mining algorithm, but the huge data can put forward higher requirements for the accuracy and performance of text mining. In view of the performance bottleneck of traditional text clustering algorithm, this paper proposes a text clustering algorithm with random features. This is a kind of clustering algorithm based on text density, at the same time using the neighboring heuristic rules, the concept of random cluster is introduced, which effectively reduces the complexity of the distance calculation.
Program plan for the National Uranium Mine Tailings Office

International Nuclear Information System (INIS)

1983-03-01

The National Uranium Mine Tailings Program was formed to conduct research into the long-term environmental behaviour of uranium mine tailings. This research is necessary to provide a data base upon which close-cut criteria for uranium mines can be based. The research program to be carried out under the auspices of the National Tailings Program Office has, as its goal, the development of this data base, and the formulation of a series of reports based on that data base. These documents are to be designed to allow the uranium mining industry to produce site-specific close-out plans which will be acceptable to the regulatory authorities. This report addresses the program to be undertaken to meet the above broad objective. It focusses on defining in more specific and explicit terms what the program objectives need to be to meet the close-out requirements currently perceived by the regulatory agencies involved. These program objectives have been refined and summarized as follows: On close-out, the tailings site shall: 1. Meet currently accepted individual exposure criteria, and meet air and water quality regulations. 2. Ensure a predictable decline in release rates of contaminants to the environment. Ideally, this decline would be monotonic in nature. 3. Meet the ALARA principle both at present and into the long-term future. 4. Ensure that the management strategy or technologies employed in close-out shall be of a passive nature and not require ongoing institutional intervention. On the basis of these program objectives, this report identifies specific program product in terms of manuals of practice, guidelines, etc. that are to be produced as a result of program activity. These documents will effectively provide guidance on acceptable close-out technology to the uranium industry and regulatory agencies
A new concept of assistive virtual keyboards based on a systematic review of text entry optimization techniques

Directory of Open Access Journals (Sweden)

Renato de Sousa Gomide

Full Text Available Abstract Introduction: Due to the increasing popularization of computers and the internet expansion, Alternative and Augmentative Communication technologies have been employed to restore the ability to communicate of people with aphasia and tetraplegia. Virtual keyboards are one of the most primitive mechanisms for alternatively entering text and play a very important role in accomplishing this task. However, the text entry for this kind of keyboard is much slower than entering information through their physical counterparts. Many techniques and layouts have been proposed to improve the typing performance of virtual keyboards, each one concerning a different issue or solving a specific problem. However, not all of them are suitable to assist seriously people with motor impairment. Methods: In order to develop an assistive virtual keyboard with improved typing performance, we performed a systematic review on scientific databases. Results: We found 250 related papers and 52 of them were selected to compose. After that, we identified eight essentials virtual keyboard features, five methods to optimize data entry performance and five metrics to assess typing performance. Conclusion: Based on this review, we introduce a concept of an assistive, optimized, compact and adaptive virtual keyboard that gathers a set of suitable techniques such as: a new ambiguous keyboard layout, disambiguation algorithms, dynamic scan techniques, static text prediction of letters and words and, finally, the use of phonetic and similarity algorithms to reduce the user's typing error rate.
Living conditions of mine workers from eight mines in South Africa

CSIR Research Space (South Africa)

Pelders, Jodi L

2018-04-01

Full Text Available interviews with labour representatives, 14 focus groups with mine workers, and 875 questionnaires completed by mine workers. The use of single-sex hostels and hostel room occupancy rates has reduced, while the use of living-out allowances (LOAs) has increased...
Nuclear Legislation in OECD and NEA Countries. Regulatory and Institutional Framework for Nuclear Activities - United States

International Nuclear Information System (INIS)

2015-01-01

This country profile provide comprehensive information on the regulatory and Institutional Framework governing nuclear activities as well as a detailed review of a full range of nuclear law topics, including: mining regime; radioactive substances; nuclear installations; trade in nuclear materials and equipment; radiation protection; radioactive waste management; non-proliferation and physical protection; transport; and nuclear third party liability. The profile is complemented by reproductions of the primary legislation regulating nuclear activities in the country. Content: I. General Regulatory Regime: 1. Introduction; 2. Mining regime; 3. Radioactive substances, nuclear fuel and equipment (Special nuclear material; Source material; By-product material; Agreement state programmes); 4. Nuclear installations (Initial licensing; Operation and inspection, including nuclear safety; Operating licence renewal; Decommissioning; Emergency response); 5. Radiological protection (Protection of workers; Protection of the public); 6. Radioactive waste management (High-level waste; Low-level waste; Disposal at sea; Uranium mill tailings; Formerly Utilized Sites Remedial Action Program - FUSRAP); 7. Non-proliferation and exports (Exports of source material, special nuclear material, production or utilisation facilities and sensitive nuclear technology; Exports of components; Exports of by-product material; Exports and imports of radiation sources; Conduct resulting in the termination of exports or economic assistance; Subsequent arrangements; Technology exports; Information and restricted data); 8. Nuclear security; 9. Transport; 10. Nuclear third party liability; II. Institutional Framework: 1. Regulatory and supervisory authorities (Nuclear Regulatory Commission - NRC; Department of Energy - DOE; Department of Labor - DOL; Department of Transportation - DOT; Environmental Protection Agency - EPA); 2. Public and semi-public agencies: A. Cabinet-level departments (Department of
Statistical properties of mine tremor aftershocks

CSIR Research Space (South Africa)

Kgarume, TE

2010-02-01

Full Text Available Mine tremors and their aftershocks pose a risk to mine workers in the deep gold mines of South Africa. The statistical properties of mine-tremor aftershocks were investigated as part of an endeavour to assess the hazard and manage the risk. Data...
Impacts of Mining and Urbanization on the Qin-Ba Mountainous Environment, China

Directory of Open Access Journals (Sweden)

Xinliang Xu

2016-05-01

Full Text Available The Qin-Ba Ecological Functional Zone is a component of China’s ecological security pattern designed to protect the regional ecosystem and maintain biodiversity. However, due to the impact of mining and urban encroachment, the plight of a sustainable ecosystem in the Qin-Ba mountainous area is deteriorating. This paper has used a remote sensing and geographic information system (GIS to examine the impacts of mining and urban encroachment on the environment in the Qin-Ba mountainous area. The results indicate that the total mined area in 2013 was 22 km2 and is predicted to escalate. Results also show that the ecosystems in Fengxian County, Shaanxi Province and Baokang County, Hubei Province were most severely affected by mining. Urbanization in the Qin-Ba mountainous area has seen an increase of 85.58 km2 in urban land use from 2010 to 2013. In addition, infrastructure development including airport construction, tourism resorts and real estate development in the Qin-Ba mountainous area has intensified environmental and biodiversity disturbances since large areas of forest have been cleared. Our results should provide insight and assistance to city planners and government officials in making informed decisions.
Neutralization and attenuation of metal species in acid mine drainage and mine leachates using magnesite: a batch experimental approach

CSIR Research Space (South Africa)

Masindi, Vhahangwele

2014-08-01

Full Text Available International Mine Water Association Conference – An Interdisciplinary Response to Mine Water Challenges, China University of Mining and Technogy, China, China, 18-22 August 2014 Neutralization and Attenuation of Metal Species in Acid Mine Drainage and Mine...
Reproducibility of studies on text mining for citation screening in systematic reviews: Evaluation and checklist.

Science.gov (United States)

Olorisade, Babatunde Kazeem; Brereton, Pearl; Andras, Peter

2017-09-01

Independent validation of published scientific results through study replication is a pre-condition for accepting the validity of such results. In computation research, full replication is often unrealistic for independent results validation, therefore, study reproduction has been justified as the minimum acceptable standard to evaluate the validity of scientific claims. The application of text mining techniques to citation screening in the context of systematic literature reviews is a relatively young and growing computational field with high relevance for software engineering, medical research and other fields. However, there is little work so far on reproduction studies in the field. In this paper, we investigate the reproducibility of studies in this area based on information contained in published articles and we propose reporting guidelines that could improve reproducibility. The study was approached in two ways. Initially we attempted to reproduce results from six studies, which were based on the same raw dataset. Then, based on this experience, we identified steps considered essential to successful reproduction of text mining experiments and characterized them to measure how reproducible is a study given the information provided on these steps. 33 articles were systematically assessed for reproducibility using this approach. Our work revealed that it is currently difficult if not impossible to independently reproduce the results published in any of the studies investigated. The lack of information about the datasets used limits reproducibility of about 80% of the studies assessed. Also, information about the machine learning algorithms is inadequate in about 27% of the papers. On the plus side, the third party software tools used are mostly free and available. The reproducibility potential of most of the studies can be significantly improved if more attention is paid to information provided on the datasets used, how they were partitioned and utilized, and
Dialogue enabling speech-to-text user assistive agent system for hearing-impaired person.

Science.gov (United States)

Lee, Seongjae; Kang, Sunmee; Han, David K; Ko, Hanseok

2016-06-01

A novel approach for assisting bidirectional communication between people of normal hearing and hearing-impaired is presented. While the existing hearing-impaired assistive devices such as hearing aids and cochlear implants are vulnerable in extreme noise conditions or post-surgery side effects, the proposed concept is an alternative approach wherein spoken dialogue is achieved by means of employing a robust speech recognition technique which takes into consideration of noisy environmental factors without any attachment into human body. The proposed system is a portable device with an acoustic beamformer for directional noise reduction and capable of performing speech-to-text transcription function, which adopts a keyword spotting method. It is also equipped with an optimized user interface for hearing-impaired people, rendering intuitive and natural device usage with diverse domain contexts. The relevant experimental results confirm that the proposed interface design is feasible for realizing an effective and efficient intelligent agent for hearing-impaired.
Trends in HIV Terminology: Text Mining and Data Visualization Assessment of International AIDS Conference Abstracts Over 25 Years.

Science.gov (United States)

Dancy-Scott, Nicole; Dutcher, Gale A; Keselman, Alla; Hochstein, Colette; Copty, Christina; Ben-Senia, Diane; Rajan, Sampada; Asencio, Maria Guadalupe; Choi, Jason Jongwon

2018-05-04

The language encompassing health conditions can also influence behaviors that affect health outcomes. Few published quantitative studies have been conducted that evaluate HIV-related terminology changes over time. To expand this research, this study included an analysis of a dataset of abstracts presented at the International AIDS Conference (IAC) from 1989 to 2014. These abstracts reflect the global response to HIV over 25 years. Two powerful methodologies were used to evaluate the dataset: text mining to convert the unstructured information into structured data for analysis and data visualization to represent the data visually to assess trends. The purpose of this project was to evaluate the evolving use of HIV-related language in abstracts presented at the IAC from 1989 to 2014. Over 80,000 abstracts were obtained from the International AIDS Society and imported into a Microsoft SQL Server database for data processing and text mining analyses. A text mining module within the KNIME Analytics Platform, an open source software, was then used to mine the partially processed data to create a terminology corpus of key HIV terms. Subject matter experts grouped the terms into categories. Tableau, a data visualization software, was used to visualize the frequency metrics associated with the terms as line graphs and word clouds. The visualized dashboards were reviewed to discern changes in terminology use across IAC years. The major findings identify trends in HIV-related terminology over 25 years. The term "AIDS epidemic" was dominantly used from 1989 to 1991 and then declined in use. In contrast, use of the term "HIV epidemic" increased through 2014. Beginning in the mid-1990s, the term "treatment experienced" appeared with increasing frequency in the abstracts. Use of terms identifying individuals as "carriers or victims" of HIV rarely appeared after 2008. Use of the terms "HIV positive" and "HIV infected" peaked in the early-1990s and then declined in use. The terms
Environmental regulatory failure and metal contamination at the Giap Lai pyrite mine, Northern Vietnam.

Science.gov (United States)

Håkan Tarras-Wahlberg, N; Nguyen, Lan T

2008-03-01

The causes for the failure in enforcement of environmental regulations at the Giap Lai pyrite mine in northern Vietnam are considered and the environmental impacts that are associated with this mine are evaluated. It is shown that sulphide-rich tailings and waste rock in the mining area represent significant sources of acid rock drainage (ARD). The ARD is causing elevated metal levels in downstream water bodies, which in turn, represent a threat to both human health and to aquatic ecosystems. Metal concentrations in impacted surface waters have increased after mine closure, suggesting that impacts are becoming progressively more serious. No post-closure, remediation measures have been applied at the mine, in spite of the existence of environmental legislation and both central and regional institutions charged with environmental supervision and control. The research presented here provides further emphasis for the recommendation that, while government institutions may need to be strengthened, and environmental regulations need to be in place, true on the ground improvement in environmental quality in Vietnam and in many other developing countries require an increased focus on promoting public awareness of industrial environmental issues.

Sustainable Supply Chain Based on News Articles and Sustainability Reports: Text Mining with Leximancer and DICTION

Directory of Open Access Journals (Sweden)

Dongwook Kim

2017-06-01

Full Text Available The purpose of this research is to explore sustainable supply chain management (SSCM trends, and firms’ strategic positioning and execution with regard to sustainability in the textile and apparel industry based on news articles and sustainability reports. Further analysis of the rhetoric in Chief executive officer (CEO letters within sustainability reports is used to determine firms’ resoluteness, positive entailments, sharing of values, perception of reality, and sustainability strategy and execution feasibility. Computer-based content analysis is used for this research: Leximancer is applied for text analysis, while dictionary-based text mining program DICTION and SPSS are used for rhetorical analysis. Overall, contents similar to the literature on environmental, social, and economic aspects of the triple bottom line (TBL are observed, however, topics such as regulation, green incentives, and international standards are not readily observed. Furthmore, ethical issues, sustainable production, quality, and customer roles are emphasized in texts analyzed. The CEO letter analysis indicates that listed firms show relatively low realism and high commonality, while North American firms exhibit relatively high commonality, and Europe firms show relatively high realism. The results will serve as a baseline for providing academia guidelines in SSCM research, and provide an opportunity for businesses to complement their sustainability strategies and executions.
Perceptions and Realities in Modern Uranium Mining - Extended Summary

International Nuclear Information System (INIS)

2014-01-01

Uranium mining and milling has evolved significantly over the years. By comparing currently leading approaches with outdated practices, the report demonstrates how uranium mining can be conducted in a way that protects workers, the public and the environment. Innovative, modern mining practices combined with strictly enforced regulatory standards are geared towards avoiding past mistakes made primarily during the early history of the industry when maximising uranium production was the principal operating consideration. Today's leading practices in uranium mining aim at producing uranium in an efficient and safe manner that limits environmental impacts to acceptable standards. As indicated in the report, the collection of baseline environmental data, environmental monitoring and public consultation throughout the life cycle of the mine enables verification that the facility is operating as planned, provides early warning of any potentially adverse impacts on the environment and keeps stakeholders informed of developments. Leading practice also supports planning for mine closure before mine production is licensed to ensure that the mining lease area is returned to an environmentally acceptable condition. The report highlights the importance of mine workers being properly trained and well equipped, as well as that of ensuring that their work environment is well ventilated so as to curtail exposure to radiation and hazardous materials and thereby minimise health impacts. (authors)
Mining: The beginning and the end of the nuclear cycle

International Nuclear Information System (INIS)

Walls, J.

1991-01-01

Mining is one of the world's oldest industries, with a rich history that has evolved into modern times. A new chapter in that history is currently being written in southeastern New Mexico at the Waste Isolation Pilot Plant (WIPP). The beginning phase of the nuclear industry occurred when uranium was mined from the underground and processed to develop the first fuel source for the nuclear history. The WIPP may well be the final chapter in closing out the nuclear cycle, by the disposal of nuclear waste 2150 feet in the underground repository. At the WIPP, traditional procedures for underground mining activities have been significantly altered in order to ensure underground safety and project adherence to numerous regulatory requirements. Innovative techniques have been developed for the WIPP underground procedures, mining equipment, and operating environments. The mining emphasis is upon quality of the excavation, not, as in conventional mines, in the production of ore
Environmental considerations. Environmental impacts of uranium mining in South Texas

International Nuclear Information System (INIS)

Kallus, M.F.

1977-01-01

Recent investigations of uranium mining and milling activities in the Grants Mineral Belt of New Mexico revealed serious environmental problems associated with these activities. An investigation was undertaken in the South Texas Uranium Belt to determine whether or not similar or other environmental problems existed. The study describes: (1) the history of uranium mining and milling in South Texas, (2) the area economy and demography, (3) the occurrence of uranium ore and (4) the regulatory aspects of uranium mining and milling in South Texas. The commercial recovery and processing of uranium in this area is described in some detail. Exploration, open pit mining, in-situ solution mining and processing techniques for ''yellowcake'' (U 3 O 8 ), the uranium product of the area, are discussed. The state and federal regulations pertinent to uranium mining and milling are summarized. Finally, the environmental effects of these activities are discussed and conclusions and recommendations are drawn
30 CFR 795.5 - Grant application procedures.

Science.gov (United States)

2010-07-01

....5 Mineral Resources OFFICE OF SURFACE MINING RECLAMATION AND ENFORCEMENT, DEPARTMENT OF THE INTERIOR SMALL OPERATOR ASSISTANCE PERMANENT REGULATORY PROGRAM-SMALL OPERATOR ASSISTANCE PROGRAM § 795.5 Grant application procedures. A State intending to administer a Small Operator Assistance Program under a grant from...
Utilization of coal ash/coal combustion products for mine reclamation

International Nuclear Information System (INIS)

Dolence, R.C.; Giovannitti, E.

1997-01-01

Society's demand for an inexpensive fuel, combined with ignorance of the long term impacts, has left numerous scars on the Pennsylvania landscape. There are over 250,000 acres of abandoned surface mines with dangerous highwalls and water filled pits. About 2,400 miles of streams do not meet water quality standards because of drainage from abandoned mines. There are uncounted households without an adequate water supply due to past mining practices. Mine fires and mine subsidence plague many Pennsylvania communities. The estimated cost to reclaim these past scars is over $15 billion. The beneficial use of coal ash in Pennsylvania for mine reclamation and mine drainage pollution abatement projects increased during the past ten years. The increase is primarily due to procedural and regulatory changes by the Department of Environmental Protection (DEP). Prior to 1986, DEP required a mining permit and a separate waste disposal permit for the use of coal ash in backfilling and reclaiming a surface mine site. In order to eliminate the dual permitting requirements and promote mine reclamation, procedural changes now allow a single permit which authorize both mining and the use of coal ash in reclaiming active and abandoned pits. The actual ash placement, however, must be conducted in accordance with the technical specifications in the solid waste regulations
Learning gene regulatory networks from only positive and unlabeled data

Directory of Open Access Journals (Sweden)

Elkan Charles

2010-05-01

Full Text Available Abstract Background Recently, supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data. The reconstruction of a network is modeled as a binary classification problem for each pair of genes. A statistical classifier is trained to recognize the relationships between the activation profiles of gene pairs. This approach has been proven to outperform previous unsupervised methods. However, the supervised approach raises open questions. In particular, although known regulatory connections can safely be assumed to be positive training examples, obtaining negative examples is not straightforward, because definite knowledge is typically not available that a given pair of genes do not interact. Results A recent advance in research on data mining is a method capable of learning a classifier from only positive and unlabeled examples, that does not need labeled negative examples. Applied to the reconstruction of gene regulatory networks, we show that this method significantly outperforms the current state of the art of machine learning methods. We assess the new method using both simulated and experimental data, and obtain major performance improvement. Conclusions Compared to unsupervised methods for gene network inference, supervised methods are potentially more accurate, but for training they need a complete set of known regulatory connections. A supervised method that can be trained using only positive and unlabeled data, as presented in this paper, is especially beneficial for the task of inferring gene regulatory networks, because only an incomplete set of known regulatory connections is available in public databases such as RegulonDB, TRRD, KEGG, Transfac, and IPA.
The African Health Profession Regulatory Collaborative for Nurses and Midwives

Directory of Open Access Journals (Sweden)

McCarthy Carey F

2012-08-01

Full Text Available Abstract Background More than thirty-five sub-Saharan African countries have severe health workforce shortages. Many also struggle with a mismatch between the knowledge and competencies of health professionals and the needs of the populations they serve. Addressing these workforce challenges requires collaboration among health and education stakeholders and reform of health worker regulations. Health professional regulatory bodies, such as nursing and midwifery councils, have the mandate to reform regulations yet often do not have the resources or expertise to do so. In 2011, the United States of America Centers for Disease Control and Prevention began a four-year initiative to increase the collaboration among national stakeholders and help strengthen the capacity of health professional regulatory bodies to reform national regulatory frameworks. The initiative is called the African Health Regulatory Collaborative for Nurses and Midwives. This article describes the African Health Regulatory Collaborative for Nurses and Midwives and discusses its importance in implementing and sustaining national, regional, and global workforce initiatives. Discussion The African Health Profession Regulatory Collaborative for Nurses and Midwives convenes leaders responsible for regulation from 14 countries in East, Central and Southern Africa. It provides a high profile, south-to-south collaboration to assist countries in implementing joint approaches to problems affecting the health workforce. Implemented in partnership with Emory University, the Commonwealth Secretariat, and the East, Central and Southern African College of Nursing, this initiative also supports four to five countries per year in implementing locally-designed regulation improvement projects. Over time, the African Health Regulatory Collaborative for Nurses and Midwives will help to increase the regulatory capacity of health professional organizations and ultimately improve regulation and
New Zealand mining legislation and recommendations for change following the Pike River disaster

International Nuclear Information System (INIS)

King, Tony

2012-01-01

There is good evidence that existing health and safety legislation in New Zealand (NZ) has produced a marked and sustained improvement in occupational health and in high-frequency low-consequence accidents.The same cannot be said for high-consequence low-frequency events. Changes to the regulatory framework should focus on these high- consequence low-frequency events although not to the detriment of low- consequence event safety. The (NZ) underground coal mining industry is characterised by a very small number of operating mines that are distinctly different from each other. This is also true of proposed mines, where there is even more variety in the type of firms proposing to operate these mines. The risks that each individual mine faces are varied and risks that predominate in one operation (for example methane or spontaneous combustion) may be entirely absent at another nearby operation. Research strongly suggests that the best regulatory approach for underground coal mines in NZ is process-based standards, supported by performance standards to identify issues and set appropriate performance outcomes. A well-resourced locally based inspectorate comprising knowledgeable and experienced inspectors with NZ mining experience is required, supported by access to overseas expertise. The Health and Safety in Employment Act 1992 should remain the governing legislation for NZ coal mining. The existing mining-specific regulations should be replaced for underground coal mining with a new set of regulations. These new regulations should draw heavily on the Queensland regulations with the best aspects of New South Wales and elsewhere also included. The internationalisation of management, advisors and workers supports an approach based on good overseas practice, rather than a highly- individualised, uniquely NZ solution. Learnings and recommendations from the Pike River Royal Commission should be incorporated into performance standards and outcomes in the new regulations.
Nuclear Regulatory Legislation

International Nuclear Information System (INIS)

1989-08-01

This compilation of statutes and material pertaining to nuclear regulatory legislation through the 100th Congress, 2nd Session, has been prepared by the Office of the General Counsel, US Nuclear Regulatory Commission, with the assistance of staff, for use as an internal resource document. Persons using this document are placed on notice that it may not be used as an authoritative citation in lieu of the primary legislative sources. Furthermore, while every effort has been made to ensure the completeness and accuracy of this material, neither the United States Government, the Nuclear Regulatory Commission, nor any of their employees makes any expressed or implied warranty or assumes liability for the accuracy or completeness of the material presented in this compilation
Best Practice in Environmental Management of Uranium Mining

International Nuclear Information System (INIS)

2010-01-01

generation as an integral part of the strategy of many countries to mitigate their impacts on climate change. The existing uranium mining industry has raised environmental standards through the introduction and development of best practices. One concern is that some of the newer, junior, mining companies and producer nations entering the market in the present expansion phase may not be aware of these best practices and current international standards. Failure to maintain the current high levels of environmental management may see the uranium mining industry's development hampered through the poor performance of a few new, but inexperienced companies, which would result in adverse reactions from the pubic and regulating authorities. This could be especially damaging to the straightforward development of the new resources demanded by the market. As part of a strategy to assist in the maintenance of standards in uranium mining and to assist in the dissemination of information on best practices, the IAEA assisted in the organization of a Technical Meeting on Best Practices in Environmental Management of Uranium Production Facilities, in Saskatoon, Canada, from 22 to 25 June 2004. This report contains the papers presented at that meeting, and the conclusions reached in discussions, together with an overall guide to what is best practice in modern uranium mining
The South African mining industry

International Nuclear Information System (INIS)

Langton, G.

1982-01-01

This paper covers six of the many mining and associated developments in South Africa. These are: (1) Deep level gold mining at Western Deep Levels Limited - (2) Palabora Mining Company Limited - SA's unique copper mine - (3) Production of steel and vanadium-rich slag at Highveld Steel and Vanadium Corporation - (4) Coal mining at Kriel and Kleinkopje Collieries - (5) A mass mining system for use below the Gabbro Sill at Premier Diamond Mine - (6) Uranium production - joint metallurgical scheme- Orange Free State Gold Mines. - For publication in this journal the original paper has been summarised. Should any reader wish to have the full text in English he should write to the author at the address below. (orig.) [de
Exploratory analysis of textual data from the Mother and Child Handbook using a text mining method (II): Monthly changes in the words recorded by mothers.

Science.gov (United States)

Tagawa, Miki; Matsuda, Yoshio; Manaka, Tomoko; Kobayashi, Makiko; Ohwada, Michitaka; Matsubara, Shigeki

2017-01-01

The aim of the study was to examine the possibility of converting subjective textual data written in the free column space of the Mother and Child Handbook (MCH) into objective information using text mining and to compare any monthly changes in the words written by the mothers. Pregnant women without complications (n = 60) were divided into two groups according to State-Trait Anxiety Inventory grade: low trait anxiety (group I, n = 39) and high trait anxiety (group II, n = 21). Exploratory analysis of the textual data from the MCH was conducted by text mining using the Word Miner software program. Using 1203 structural elements extracted after processing, a comparison of monthly changes in the words used in the mothers' comments was made between the two groups. The data was mainly analyzed by a correspondence analysis. The structural elements in groups I and II were divided into seven and six clusters, respectively, by cluster analysis. Correspondence analysis revealed clear monthly changes in the words used in the mothers' comments as the pregnancy progressed in group I, whereas the association was not clear in group II. The text mining method was useful for exploratory analysis of the textual data obtained from pregnant women, and the monthly change in the words used in the mothers' comments as pregnancy progressed differed according to their degree of unease. © 2016 Japan Society of Obstetrics and Gynecology.
Assessing semantic similarity of texts - Methods and algorithms

Science.gov (United States)

Rozeva, Anna; Zerkova, Silvia

2017-12-01

Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.
Nuclear Legislation in OECD and NEA Countries. Regulatory and Institutional Framework for Nuclear Activities - Turkey

International Nuclear Information System (INIS)

2008-01-01

This country profile provide comprehensive information on the regulatory and Institutional Framework governing nuclear activities as well as a detailed review of a full range of nuclear law topics, including: mining regime; radioactive substances; nuclear installations; trade in nuclear materials and equipment; radiation protection; radioactive waste management; non-proliferation and physical protection; transport; and nuclear third party liability. The profile is complemented by reproductions of the primary legislation regulating nuclear activities in the country. Content: I. General regulatory regime: 1. Introduction; 2. Mining regime; 3. Radioactive substances, nuclear fuel and equipment; 4. Nuclear installations; 5. Trade in nuclear materials and equipment; 6. Radiation protection; 7. Radioactive waste management; 8. Nuclear security; 9. Transport; 10. Nuclear third party liability; II. Institutional Framework: 1. Regulatory and supervisory authorities (Prime Minister; Ministry of Energy and Natural Resources; Ministry of Health; Ministry of the Environment and Forestry); 2. Public and semi-public agencies (Turkish Atomic Energy Authority - TAEK; General Directorate for Mineral Research and Exploration - MTA; ETI Mine Works General Management; Turkish Electric Generation and Transmission Corporation - TEAS; Turkish Electricity Distribution Corporation - TEDAS)
An investigation on natural radioactivity from mining industry ...

African Journals Online (AJOL)

An investigation on natural radioactivity from mining industry # ... PROMOTING ACCESS TO AFRICAN RESEARCH ... Mining originating industries such as the coal industries, petroleum extraction and processing and natural gas, mining enrichment waste, phosphate, ... EMAIL FREE FULL TEXT EMAIL FREE FULL TEXT
13 CFR 121.510 - What is the size standard for leasing of Government land for uranium mining?

Science.gov (United States)

2010-01-01

... 13 Business Credit and Assistance 1 2010-01-01 2010-01-01 false What is the size standard for leasing of Government land for uranium mining? 121.510 Section 121.510 Business Credit and Assistance... standard for leasing of Government land for uranium mining? A concern is small for this purpose if it...
The BioLexicon: a large-scale terminological resource for biomedical text mining

Science.gov (United States)

2011-01-01

Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical
Establishing exemption and clearance criteria by the regulatory authority

International Nuclear Information System (INIS)

Salih, A.E.A.

2012-04-01

This Project work discusses the relationship between the concepts of exemption and clearance, and their practical use in the overall scheme of regulatory control of practices. It also discusses how exemptions and clearance is established and the scope of its applications for regulatory control. The concept of general clearance levels for any type of material and any possible pathway of disposal is also introduced in this work. Guidance of the Group of Experts establishing scenarios for general clearance, parameter values, and a nuclide-specific list of calculated clearance levels is also presented. Regulatory authorities are required to develop guidance on exemption and clearance levels to assist licensees and registrants to know which practices and sources within practices are exempted from regulatory control and those to be cleared from further controls. Exemption and clearance levels are tools for assisting the Regulatory Authority to optimize the use of resources. (author)
Categorization of Survey Text Utilizing Natural Language Processing and Demographic Filtering

Science.gov (United States)

2017-09-01

primary topics into bins using more traditional visual text mining methods including Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003...traditional visual text mining methods to identify primary topics among the comments. Once labels are determined, we use them to choose words or phrases...tm: Text Mining Package. Retrieved from https://CRAN.R-project.org/package=tm Grün, B., & Hornik, K. (2011). topicmodels: An R package for fitting

Philippine Mining Capitalism: The Changing Terrains of Struggle in the Neoliberal Mining Regime

Directory of Open Access Journals (Sweden)

Alvin A. Camba

2016-06-01

Full Text Available This article analyzes how the mining sector and anti-mining groups compete for mining outcomes in the Philippines. I argue that the transition to a neoliberal mineral regime has empowered the mining sector and weakened the mining groups by shifting the terrains of struggle onto the domains of state agencies and scientific networks. Since the neoliberal era, the mining sector has come up with two strategies. First, technologies of subjection elevate various public institutions to elect and select the processes aimed at making mining accountable and sensitive to the demands of local communities. However, they often refuse or lack the capacity to intervene effectively. Second, technologies of subjectivities allow a selective group of industry experts to single-handedly determine the environmental viability of mining projects. Mining consultants, specialists, and scientists chosen by mining companies determine the potential environmental damage on water bodies, air pollution, and soil erosion. Because of the mining capital’s access to economic and legal resources, anti-mining communities across the Philippines have been forced to compete on an unequal terrain for a meaningful social dialogue and mining outcomes.
Optimized mine ventilation on demand (OMVOD)

International Nuclear Information System (INIS)

Anderson, M.

2009-01-01

This paper provided an overview of the Optimized Mine Ventilation on Demand (OMVOD) system that is being installed at Xstrata Nickel Rim South Project and at Vale Inco's Totten Mine in Sudbury. The OMVOD system is designed to dynamically monitor and control air quality and quantity in real time and dilute and remove hazardous substances including diesel particulate matter (DPM), carbon monoxide (CO) and nitrous oxide (NO 2 ). It is also designed to control the thermal environment and provide ventilation for humans as well as mobile equipment engine combustion according to regulatory standards. The paper highlighted the OMVOD system optimization of energy, air quality measurement and control and production management of the mines through real time dynamic automation. Topics of discussion included real-time tracking and monitoring of diesel equipment; real-time tracking of underground miners; real-time evaluation of mine ventilation networks; and real-time control and optimization of ventilation equipment. ABB and Simsmart Technologies have joined forces to provide underground mining customers with a ventilation optimization solution. Simsmart's OMVOD provides proven real time/dynamic automation technology to significantly reduce energy costs, provide health and safety benefits as well as major capital cost savings while realizing an increase in production.
OCHRE PRECIPITATES AND ACID MINE DRAINAGE IN A MINE ENVIRONMENT

Directory of Open Access Journals (Sweden)

BRANISLAV MÁŠA

2012-03-01

Full Text Available This paper is focused to characterize the ochre precipitates and the mine water effluents of some old mine adits and settling pits after mining of polymetallic ores in Slovakia. It was shown that the mine water effluents from two different types of deposits (adits; settling pits have similar composition and represent slightly acidic sulphate water (pH in range 5.60-6.05, sulphate concentration from 1160 to 1905 g.dm-3. The ochreous precipitates were characterized by methods of X-ray diffraction analysis (XRD, scanning electron microscopy (SEM and B.E.T. method for measuring the specific surface area and porosity. The dominant phases were ferrihydrite with goethite or goethite with lepidocrocide.
The US uranium mining industry: 1980 and today

International Nuclear Information System (INIS)

Stover, D.E.

1991-01-01

In 1980, 16 800 tonnes of uranium were produced in the United States, making it the largest producing nation with about 40% of Western World (WOCA) production. By 1990, US production had fallen to approximately 3500 tonnes U, representing only about 10% of WOCA production. Clearly the US uranium mining industry was strongly altered by the events of the intervening years. Widespread focus on declining prices overshadowed a second important set of events. Namely, the rapidly changing regulatory and environmental atmosphere in the United States which continues adversely to affect conventional uranium mining. As a result of these events, the size and structure of the US uranium mining industry was irrevocably changed. Within this altered industry is a rapidly maturing technology that provides a more efficient and lower-cost means of uranium production, in-situ leaching (ISL). By exploiting the advantages of relatively low capital investments, shorter development times, reduced labour costs, and increased production flexibility of ISL mining, the US uranium mining industry will be a competitive component of the world's uranium supply for the 1990s. (author)
Text mining electronic hospital records to automatically classify admissions against disease: Measuring the impact of linking data sources.

Science.gov (United States)

Kocbek, Simon; Cavedon, Lawrence; Martinez, David; Bain, Christopher; Manus, Chris Mac; Haffari, Gholamreza; Zukerman, Ingrid; Verspoor, Karin

2016-12-01

Text and data mining play an important role in obtaining insights from Health and Hospital Information Systems. This paper presents a text mining system for detecting admissions marked as positive for several diseases: Lung Cancer, Breast Cancer, Colon Cancer, Secondary Malignant Neoplasm of Respiratory and Digestive Organs, Multiple Myeloma and Malignant Plasma Cell Neoplasms, Pneumonia, and Pulmonary Embolism. We specifically examine the effect of linking multiple data sources on text classification performance. Support Vector Machine classifiers are built for eight data source combinations, and evaluated using the metrics of Precision, Recall and F-Score. Sub-sampling techniques are used to address unbalanced datasets of medical records. We use radiology reports as an initial data source and add other sources, such as pathology reports and patient and hospital admission data, in order to assess the research question regarding the impact of the value of multiple data sources. Statistical significance is measured using the Wilcoxon signed-rank test. A second set of experiments explores aspects of the system in greater depth, focusing on Lung Cancer. We explore the impact of feature selection; analyse the learning curve; examine the effect of restricting admissions to only those containing reports from all data sources; and examine the impact of reducing the sub-sampling. These experiments provide better understanding of how to best apply text classification in the context of imbalanced data of variable completeness. Radiology questions plus patient and hospital admission data contribute valuable information for detecting most of the diseases, significantly improving performance when added to radiology reports alone or to the combination of radiology and pathology reports. Overall, linking data sources significantly improved classification performance for all the diseases examined. However, there is no single approach that suits all scenarios; the choice of the
Mining robotics sensors

CSIR Research Space (South Africa)

Green, JJ

2012-04-01

Full Text Available of threedimensional cameras (SR 4000 and XBOX Kinect) and a thermal imaging sensor (FLIR A300) in order to create 3d thermal models of narrow mining stopes. This information can be used in determining the risk of rockfall in an underground mine, which is a major...
In-mine (tunnel-to-tunnel) electrical resistance tomography in South African platinum mines

CSIR Research Space (South Africa)

Van Schoor, Abraham M

2009-12-01

Full Text Available The applicability of tunnel-to-tunnel electrical resistance tomography (ERT) for imaging disruptive geological structures ahead of mining, in an igneous platinum mining environment is assessed. The geophysical targets of interest are slump...
The psychosocial impacts of fly-in fly-out and drive-in drive-out mining on mining employees: a qualitative study.

Science.gov (United States)

Torkington, Amanda May; Larkins, Sarah; Gupta, Tarun Sen

2011-06-01

To explore how fly-in fly-out (FIFO) and drive-in drive-out (DIDO) mining affects the psychosocial well-being of miners resident in a rural north Queensland town as well as the sources of support miners identify and use in managing these effects. A descriptive qualitative study, using semistructured interviews. Charters Towers, a rural town in north Queensland, and a remote north-western Queensland mine. Eleven people, resident in or near Charters Towers, currently or formerly employed in FIFO or DIDO mining. Self-reported effects on psychosocial well-being and sources of support. Participants reported positive and negative psychosocial impacts across domains including family life, relationships, social life, work satisfaction, mood, sleep and financial situation. Concerns about the impact on participants' partners were described. Awareness of onsite support, such as Employee Assistance Programs, varied. Other supports included administration staff and nurses or medics. Trusted friends or colleagues at the mine site were considered a preferred means of support. Some, but not most, had experienced coworkers discussing problems with them. A reluctance to seek support was described, with a number of barriers identified. Those having problems might not recognise their own stress and thus not seek support. This study identifies numerous psychosocial impacts on FIFO/DIDO miners and their partners, and provides insights into preferences regarding support. Employee Assistance Programs cannot be relied upon as the sole means of support. Further studies exploring the impact upon and supports for FIFO/DIDO workers and their partners will assist in better understanding these issues. © 2011 The Authors. Australian Journal of Rural Health © National Rural Health Alliance Inc.
Liability for damage caused by ground subsidence in the Netherlands. The role of the Mining Law and the Technical Committee Ground Subsidence

International Nuclear Information System (INIS)

Roggenkamp, M.M.; Verwer, Ch.P.

2004-01-01

This article provides an overview of the legal regulatory framework in respect of movements of the soil (i.e. subsidence and earth tremors) following the exploration and extraction of minerals in the Netherlands, and the liability for the damage they cause. This legal framework has been changed considerably since the new Mining Act came into force on January 1st, 2003. After having examined the causes of subsidence and subsequent earth tremors, and relationships with the exploration and extraction of subsoil minerals such as oil, gas, salt and coal, the article continues by presenting the legislation of this area. The authors analyse the applicable legislation before as well as after the introduction of the new Mining Act. The two judicial regimen have a similar approach: While the rules and regulations concerning earth movements are laid down in the Mining Act, the legal foundation for the liability for damage resulting from earth movements is provided by the Civil Code. The parliamentary debates on the Mining Bill specifically dealt with the issue of earth movements and the question whether either a system of absolute (vicarious) liability would apply, or a system of strict liability. One of the reasons for not having a system of absolute liability was the wish of Parliament to lay down in the Mining Act provisions for the creation of a Technical Committee on Earth Movements. It is the remit of this Committee to advise the Minister of Economic Affairs on all matters related to movements of the soil. It's duty is also to advise on the causal) relationship between mining activities and earth movements, and the amount of damages to be paid by the mining companies, at the request of individual persons. In order to avoid individuals not receiving any compensation for damages, the new Mining Act also calls for the introduction of a special Fund for Mining Damages. Individual persons would be entitled to make a claim to this fund in situations such as the mining company
High utility-itemset mining and privacy-preserving utility mining

Directory of Open Access Journals (Sweden)

Jerry Chun-Wei Lin

2016-03-01

Full Text Available In recent decades, high-utility itemset mining (HUIM has emerging a critical research topic since the quantity and profit factors are both concerned to mine the high-utility itemsets (HUIs. Generally, data mining is commonly used to discover interesting and useful knowledge from massive data. It may, however, lead to privacy threats if private or secure information (e.g., HUIs are published in the public place or misused. In this paper, we focus on the issues of HUIM and privacy-preserving utility mining (PPUM, and present two evolutionary algorithms to respectively mine HUIs and hide the sensitive high-utility itemsets in PPUM. Extensive experiments showed that the two proposed models for the applications of HUIM and PPUM can not only generate the high quality profitable itemsets according to the user-specified minimum utility threshold, but also enable the capability of privacy preserving for private or secure information (e.g., HUIs in real-word applications.
Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

Directory of Open Access Journals (Sweden)

André SANTOS

2012-07-01

Full Text Available Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents.
Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

Directory of Open Access Journals (Sweden)

Anália LOURENÇO

2013-07-01

Full Text Available Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents.
Use of Data Mining Techniques to Detect Medical Fraud in Health Insurance

Directory of Open Access Journals (Sweden)

Kuo-Chung Lin

2012-04-01

Full Text Available The health insurance claims application case the inspection usually relies on experts’ experience for verification and experienced personnel in charge for checking. However, due to the heavy work load and the insufficiency of manpower and experience, the ratio of miscarriages of justice is high, leading to improper settlement of claims and the waste of social resources. This paper takes advantage of data-mining technology to design models and find out cases requiring for manual inspection so as to save time and manpower. Six models are designed in this paper. By the analysis of the 20/80 principle and the coverage and accuracy ratio, a great number of periodic data (over 2 million records are fed back to the data-mining models after repetitive verification. Also, it is discovered that to integrate the data-mining technology and feed back to different business stages so as to establish early warning system will be an important topic for the health insurance system in hospital’s EMR in the future. Meanwhile, as the information acquired by data-mining needs to be stored and the traditional database technology has limitations. Next time, this paper explores the ontology framework to be set up by semantic network technology in the future in order to assist the storage of knowledge gained by data-mining.
Confusion in regulating coal mine water pollution: Regulatory overlap in SMCRA and the CWA

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-11-01

Whenever a government uses two major pieces of legislation to combat a single public enemy, complaints of over-regulation and questions of jurisdiction from the individuals and industries affected are inevitable. Acid mine drainage (AMD) is a harmful and elusive enemy which threatens the integrity of our nation`s waters. The threat it poses to our environment cannot be solved without the awesome power of government; however, a fair and consistent enforcement of these two acts is imperative. The mining industry`s push to exempt landowners from liability for the acid discharges from abandoned mines is questionable in light of the serious AMD problems from these sources. The burden imposed on landowners to take abatement measures under the CWA is far outweighed by the continuing threat of abandoned mine AMD. The state environmental authorities who must completely reclaim these abandoned mine lands must pursue the landowners and make them pay the costs. In order to accomplish this, the state SMCRA regulators must increase coordination with the EPA`s state counterparts. The deficit problem in the ANL trust fund likely to improve anytime soon. Since SMCRA prohibits holding landowners liable for reclamation costs, the only way the abandoned mine AMD problem can be effectively remedied is by state environmental authorities seeking sanctions under the probably correct in claiming that the current CWA laws governing in-stream impoundments are overly burdensome. The EPA`s interest in protecting the quality of industrial impoundments that have no meaningful wetlands or recreational use seems to serve no rational purpose, especially in light of the onerous burden it places on coal operators attempting to comply with the CWA.
ASSISTments Dataset from Multiple Randomized Controlled Experiments

Science.gov (United States)

Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil

2016-01-01

In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…
Managing environmental and health impacts of uranium mining

Energy Technology Data Exchange (ETDEWEB)

Vance, R.E.; Cameron, R., E-mail: robert.vance@oecd.org, E-mail: ron.cameron@oecd.org [OECD Nuclear Energy Agency (France)

2014-07-01

As the raw material that fuels nuclear power plants that generate significant amounts of electricity with full life cycle carbon emissions as low as renewable energy sources, uranium is a valuable commodity. Yet uranium mining remains controversial, principally because of environmental and health impacts created when mining was undertaken by governments to meet Cold War strategic requirements. Uranium mining is conducted under significantly different circumstances today. Since the era of military production, societal expectations of environmental protection and the safety of workers and the public have evolved as the outcomes of the early era of mining became apparent, driving changes in regulatory oversight and mining practices. Key aspects of leading practice uranium mining are presented (conventional worker health and safety, worker radiation protection, public health and safety, water quality, tailings and waste rock management) and compared with historic practices to demonstrate the scale of differences. The application of additional aspects of uranium mine life cycle management (public consultation, environmental impact assessment, analysis of socio-economic impacts/benefits, environmental monitoring, financial assurance, product transport, security and safeguards, emergency planning and knowledge transfer), introduced as the industry matured, enhance overall management practices for the long term. Results from several case studies show that improved management of key aspects of uranium mining, combined with the incorporation of new life cycle parameters, have transformed the industry into the most regulated and arguably one of the safest and environmentally responsible types of mining in the world. (author)
Nuclear Legislation in OECD and NEA Countries. Regulatory and Institutional Framework for Nuclear Activities - Iceland

International Nuclear Information System (INIS)

2008-01-01

This country profile provide comprehensive information on the regulatory and Institutional Framework governing nuclear activities as well as a detailed review of a full range of nuclear law topics, including: mining regime; radioactive substances; nuclear installations; trade in nuclear materials and equipment; radiation protection; radioactive waste management; non-proliferation and physical protection; transport; and nuclear third party liability. The profile is complemented by reproductions of the primary legislation regulating nuclear activities in the country. Content: I. General Regulatory Regime: 1. Introduction; 2. Mining regime; 3. Radioactive substances and equipment; 4. Nuclear installations; 5. Trade in nuclear materials and equipment; 6. Radiation protection; 7. Radioactive waste management; 8. Nuclear security; 9. Transport; 10. Nuclear Third Party Liability; II. Institutional Framework: 1. Regulatory and supervisory authorities (Minister of Health and Social Security; Icelandic Radiation Protection Institute)
MONITORING OF MINING

Directory of Open Access Journals (Sweden)

Berislav Šebečić

1996-12-01

Full Text Available The way mining was monitored in the past depended on knowledge, interest and the existing legal regulations. Documentary evidence about this work can be found in archives, libraries and museums. In particular, there is the rich archival material (papers and books concerning the work of the one-time Imperial and Royal Mining Captaincies in Zagreb, Zadar, Klagenfurt and Split, A minor part of the documentation has not yet been transferred to Croatia. From mining handbooks and books we can also find out about mining in Croatia. In the context of Austro-Hungary. For example, we can find out that the first governorships in Zagreb and Zadar headed the Ban, Count Jelacic and Baron Mamula were also the top mining authorities, though this, probably from political motives, was suppressed in the guides and inventories or the Mining Captaincies. At the end of the 1850s, Croatia produced 92-94% of sea salt, up to 8.5% of sulphur, 19.5% of asphalt and 100% of oil for the Austro-Hungarian empire. From data about mining in the Split Mining Captaincy, prepared for the Philadephia Exhibition, it can be seen that in the exploratory mining operations in which there were 33,372 independent mines declared in 1925 they were looking mainly for bauxite (60,0%, then dark coal (19,0%, asphalts (10.3% and lignites (62%. In 1931, within the area covered by the same captaincy, of 74 declared mines, only 9 were working. There were five coal mines, three bauxite mines and one for asphalt. I suggest that within state institution, the Mining Captaincy or Authority be renewed, or that a Mining and Geological Authority be set ap, which would lead to the more complete affirmation of Croatian mining (the paper is published in Croatian.
Radioprotection and radiotherapy: new regulatory texts

International Nuclear Information System (INIS)

Cosset, J.M.

1998-01-01

This article reviews about radiation protection of the workers in the radiotherapy centers. The different texts are explained. These texts (international and european ones) have to aim to reinforce the protection of personnel working in radiotherapy services, to reduce as it is possible the determinists an stochastic effects to organs out of the irradiated volumes, to avoid severe accidents. The radiotherapists have to keep in their mind that treatments must be justified in a clear way and optimized as reasonably achievable. (N.C.)
Modelling of Radiological Health Risks from Gold Mine Tailings in Wonderfonteinspruit Catchment Area, South Africa

Directory of Open Access Journals (Sweden)

Manny Mathuthu

2016-06-01

Full Text Available Mining is one of the major causes of elevation of naturally-occurring radionuclide material (NORM concentrations on the Earth’s surface. The aim of this study was to evaluate the human risk associated with exposure to NORMs in soils from mine tailings around a gold mine. A broad-energy germanium detector was used to measure activity concentrations of these NORMs in 66 soil samples (56 from five mine tailings and 10 from the control area. The RESidual RADioactivity (RESRAD OFFSITE modeling program (version 3.1 was then used to estimate the radiation doses and the cancer morbidity risk of uranium-238 (238U, thorium-232 (232Th, and potassium-40 (40K for a hypothetical resident scenario. According to RESRAD prediction, the maximum total effective dose equivalent (TEDE during 100 years was found to be 0.0315 mSv/year at year 30, while the maximum total excess cancer morbidity risk for all the pathways was 3.04 × 10−5 at year 15. The US Environmental Protection Agency considers acceptable for regulatory purposes a cancer risk in the range of 10−6 to 10−4. Therefore, results obtained from RESRAD OFFSITE code has shown that the health risk from gold mine tailings is within acceptable levels according to international standards.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.