WorldWideScience

Sample records for mining information extraction

  1. Mining knowledge from text repositories using information extraction: A review

    Indian Academy of Sciences (India)

    Sandeep R Sirsat; Dr Vinay Chavan; Dr Shrinivas P Deshpande

    2014-02-01

    There are two approaches to mining text form online repositories. First, when the knowledge to be discovered is expressed directly in the documents to be mined, Information Extraction (IE) alone can serve as an effective tool for such text mining. Second, when the documents contain concrete data in unstructured form rather than abstract knowledge, Information Extraction (IE) can be used to first transform the unstructured data in the document corpus into a structured database, and then use some state-of-the-art data mining algorithms/tools to identify abstract patterns in this extracted data. This paper presents the review of several methods related to these two approaches.

  2. EnvMine: A text-mining system for the automatic extraction of contextual information

    Directory of Open Access Journals (Sweden)

    de Lorenzo Victor

    2010-06-01

    Full Text Available Abstract Background For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles. So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations from textual sources of any kind. Results EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings. Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude, thus allowing the calculation of distance between the individual locations. Conclusion EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical

  3. Addressing Information Proliferation: Applications of Information Extraction and Text Mining

    Science.gov (United States)

    Li, Jingjing

    2013-01-01

    The advent of the Internet and the ever-increasing capacity of storage media have made it easy to store, deliver, and share enormous volumes of data, leading to a proliferation of information on the Web, in online libraries, on news wires, and almost everywhere in our daily lives. Since our ability to process and absorb this information remains…

  4. Addressing Information Proliferation: Applications of Information Extraction and Text Mining

    Science.gov (United States)

    Li, Jingjing

    2013-01-01

    The advent of the Internet and the ever-increasing capacity of storage media have made it easy to store, deliver, and share enormous volumes of data, leading to a proliferation of information on the Web, in online libraries, on news wires, and almost everywhere in our daily lives. Since our ability to process and absorb this information remains…

  5. A construction scheme of web page comment information extraction system based on frequent subtree mining

    Science.gov (United States)

    Zhang, Xiaowen; Chen, Bingfeng

    2017-08-01

    Based on the frequent sub-tree mining algorithm, this paper proposes a construction scheme of web page comment information extraction system based on frequent subtree mining, referred to as FSM system. The entire system architecture and the various modules to do a brief introduction, and then the core of the system to do a detailed description, and finally give the system prototype.

  6. An Useful Information Extraction using Image Mining Techniques from Remotely Sensed Image (RSI)

    OpenAIRE

    Dr. C. Jothi Venkateswaran,; Murugan, S.; Dr. N. Radhakrishnan

    2010-01-01

    Information extraction using mining techniques from remote sensing image (RSI) is rapidly gaining attention among researchers and decision makers because of its potential in application oriented studies. Knowledge discovery from image poses many interesting challenges such as preprocessing the image data set, training the data and discovering useful image patterns applicable to many newapplication frontiers. In the image rich domain of RSI, image mining implies the synergy of data mining and ...

  7. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus

    OpenAIRE

    Alnazzawi, Noha; Thompson, Paul; Batista-Navarro, Riza; Ananiadou, Sophia

    2015-01-01

    Background Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from fre...

  8. An Useful Information Extraction using Image Mining Techniques from Remotely Sensed Image (RSI

    Directory of Open Access Journals (Sweden)

    Dr. C. Jothi Venkateswaran,

    2010-11-01

    Full Text Available Information extraction using mining techniques from remote sensing image (RSI is rapidly gaining attention among researchers and decision makers because of its potential in application oriented studies. Knowledge discovery from image poses many interesting challenges such as preprocessing the image data set, training the data and discovering useful image patterns applicable to many newapplication frontiers. In the image rich domain of RSI, image mining implies the synergy of data mining and image processing technology. Such culmination of techniques renders a valuable tool in information extraction. Also, this encompasses the problem of handling a larger data base of varied image data formats representing various levels ofinformation such as pixel, local and regional. In the present paper, various preprocessing corrections and techniques of image mining are discussed.

  9. Linking genes to literature: text mining, information extraction, and retrieval applications for biology.

    Science.gov (United States)

    Krallinger, Martin; Valencia, Alfonso; Hirschman, Lynette

    2008-01-01

    Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet http://zope.bioinfo.cnio.es/bionlp_tools/.

  10. CTSS: A Tool for Efficient Information Extraction with Soft Matching Rules for Text Mining

    Directory of Open Access Journals (Sweden)

    A. Christy

    2008-01-01

    Full Text Available The abundance of information available digitally in modern world had made a demand for structured information. The problem of text mining which dealt with discovering useful information from unstructured text had attracted the attention of researchers. The role of Information Extraction (IE software was to identify relevant information from texts, extracting information from a variety of sources and aggregating it to create a single view. Information extraction systems depended on particular corpora and were poor in recall values. Therefore, developing the system as domain-independent as well as improving the recall was an important challenge for IE. In this research, the authors proposed a domain-independent algorithm for information extraction, called SOFTRULEMINING for extracting the aim, methodology and conclusion from technical abstracts. The algorithm was implemented by combining trigram model with softmatching rules. A tool CTSS was constructed using SOFTRULEMINING and was tested with technical abstracts of www.computer.org and www.ansinet.org and found that the tool had improved its recall value and therefore the precision value in comparison with other search engines.

  11. National information service in mining, mineral processing and extractive metallurgy. [MINTEC

    Energy Technology Data Exchange (ETDEWEB)

    Romaniuk, A.S.; MacDonald, R.J.C.

    1979-03-01

    More than a dedade ago, CANMET management recognized the need to make better use of existing technological information in mining and extractive metallurgy, two fields basic to the economic well-being of Canada. There were at that time no indexes or files didicated to disseminating technical information for the many minerals mined and processed in Canada, including coal. CANMET, with the nation's largest research and library resources in the minerals field, was in a unique position to fill this need. Initial efforts were concentrated on building a mining file beginning with identification of world sources of published information, development of a special thesaurus of terms for language control and adoption of a manual indexing/retrieval system. By early 1973, this file held 8,300 references, with source, abstract and keywords given for each reference. In mid-1973, operations were computerized. Software for indexing and retrieval by batch mode was written by CANMET staff to utilize the hardware facilities of EMR's Computer Science Center. The resulting MINTEC file, one of the few files of technological information produced in Canada, is the basis for the national literature search service in mining offered by CANMET. Attention is now focussed on building a sister-file in extractive metallurgy using the system already developed. Published information sources have been identified and a thesaurus of terms is being compiled and tested. The software developed for CANMET's file-building operations has several features, including the selective dissemination of information and production from magnetic tape of photoready copy for publication, as in a bi-monthly abstracts journal.

  12. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.

    Science.gov (United States)

    Alnazzawi, Noha; Thompson, Paul; Batista-Navarro, Riza; Ananiadou, Sophia

    2015-01-01

    Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single

  13. Case study on the extraction of land cover information from the SAR image of a coal mining area

    Institute of Scientific and Technical Information of China (English)

    HU Zhao-ling; LI Hai-quan; DU Pei-jun

    2009-01-01

    In this study, analyses are conducted on the information features of a construction site, a cornfield and subsidence seeper land in a coal mining area with a synthetic aperture radar (SAR) image of medium resolution. Based on features of land cover of the coal mining area, on texture feature extraction and a selection method of a gray-level co-occurrence matrix (GLCM) of the SAR image, we propose in this study that the optimum window size for computing the GLCM is an appropriate sized window that can effectively distinguish different types of land cover. Next, a band combination was carried out over the text feature images and the band-filtered SAR image to secure a new multi-band image. After the transformation of the new image with principal component analysis, a classification is conducted selectively on three principal component bands with the most information. Finally, through training and experimenting with the samples, a better three-layered BP neural network was established to classify the SAR image. The results show that, assisted by texture information, the neural network classification improved the accuracy of SAR image clas-sification by 14.6%, compared with a classification by maximum likelihood estimation without texture information.

  14. Information extraction

    NARCIS (Netherlands)

    Zhang, Lei; Hoede, C.

    2002-01-01

    In this paper we present a new approach to extract relevant information by knowledge graphs from natural language text. We give a multiple level model based on knowledge graphs for describing template information, and investigate the concept of partial structural parsing. Moreover, we point out that

  15. Metaproteomics: extracting and mining proteome information to characterize metabolic activities in microbial communities.

    Science.gov (United States)

    Abraham, Paul E; Giannone, Richard J; Xiong, Weili; Hettich, Robert L

    2014-06-17

    Contemporary microbial ecology studies usually employ one or more "omics" approaches to investigate the structure and function of microbial communities. Among these, metaproteomics aims to characterize the metabolic activities of the microbial membership, providing a direct link between the genetic potential and functional metabolism. The successful deployment of metaproteomics research depends on the integration of high-quality experimental and bioinformatic techniques for uncovering the metabolic activities of a microbial community in a way that is complementary to other "meta-omic" approaches. The essential, quality-defining informatics steps in metaproteomics investigations are: (1) construction of the metagenome, (2) functional annotation of predicted protein-coding genes, (3) protein database searching, (4) protein inference, and (5) extraction of metabolic information. In this article, we provide an overview of current bioinformatic approaches and software implementations in metaproteome studies in order to highlight the key considerations needed for successful implementation of this powerful community-biology tool.

  16. Metaproteomics: extracting and mining proteome information to characterize metabolic activities in microbial communities

    Energy Technology Data Exchange (ETDEWEB)

    Abraham, Paul E [ORNL; Giannone, Richard J [ORNL; Xiong, Weili [ORNL; Hettich, Robert {Bob} L [ORNL

    2014-01-01

    Contemporary microbial ecology studies usually employ one or more omics approaches to investigate the structure and function of microbial communities. Among these, metaproteomics aims to characterize the metabolic activities of the microbial membership, providing a direct link between the genetic potential and functional metabolism. The successful deployment of metaproteomics research depends on the integration of high-quality experimental and bioinformatic techniques for uncovering the metabolic activities of a microbial community in a way that is complementary to other meta-omic approaches. The essential, quality-defining informatics steps in metaproteomics investigations are: (1) construction of the metagenome, (2) functional annotation of predicted protein-coding genes, (3) protein database searching, (4) protein inference, and (5) extraction of metabolic information. In this article, we provide an overview of current bioinformatic approaches and software implementations in metaproteome studies in order to highlight the key considerations needed for successful implementation of this powerful community-biology tool.

  17. Mining Social Data to Extract Intellectual Knowledge

    Directory of Open Access Journals (Sweden)

    Muhammad Mahbubur Rahman

    2012-09-01

    Full Text Available Social data mining is an interesting phenomenon which colligates different sources of social data to extract information. This information can be used in relationship prediction, decision making, pattern recognition, social mapping, responsibility distribution and many other applications. This paper presents a systematical data mining architecture to mine intellectual knowledge from social data. In this research, we use social networking site facebook as primary data source. We collect different attributes such as about me, comments, wall post and age from facebook as raw data and use advanced data mining approaches to excavate intellectual knowledge. We also analyze our mined knowledge with comparison for possible usages like as human behavior prediction, pattern recognition, job responsibility distribution, decision making and product promoting.

  18. Mining of the social network extraction

    Science.gov (United States)

    Nasution, M. K. M.; Hardi, M.; Syah, R.

    2017-01-01

    The use of Web as social media is steadily gaining ground in the study of social actor behaviour. However, information in Web can be interpreted in accordance with the ability of the method such as superficial methods for extracting social networks. Each method however has features and drawbacks: it cannot reveal the behaviour of social actors, but it has the hidden information about them. Therefore, this paper aims to reveal such information in the social networks mining. Social behaviour could be expressed through a set of words extracted from the list of snippets.

  19. Nuclear expert web mining system: monitoring and analysis of nuclear acceptance by information retrieval and opinion extraction on the Internet

    Energy Technology Data Exchange (ETDEWEB)

    Reis, Thiago; Barroso, Antonio C.O.; Imakuma, Kengo, E-mail: thiagoreis@usp.b, E-mail: barroso@ipen.b, E-mail: kimakuma@ipen.b [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil)

    2011-07-01

    This paper presents a research initiative that aims to collect nuclear related information and to analyze opinionated texts by mining the hypertextual data environment and social networks web sites on the Internet. Different from previous approaches that employed traditional statistical techniques, it is being proposed a novel Web Mining approach, built using the concept of Expert Systems, for massive and autonomous data collection and analysis. The initial step has been accomplished, resulting in a framework design that is able to gradually encompass a set of evolving techniques, methods, and theories in such a way that this work will build a platform upon which new researches can be performed more easily by just substituting modules or plugging in new ones. Upon completion it is expected that this research will contribute to the understanding of the population views on nuclear technology and its acceptance. (author)

  20. Alteration Information Extraction by Applying Synthesis Processing Techniques to Landsat ETM+Data: Case Study of Zhaoyuan Gold Mines,Shandong Province, China

    Institute of Scientific and Technical Information of China (English)

    Liu Fujiang; Wu Xincai; Sun Huashan; Guo Yan

    2007-01-01

    Satellite remote sensing data are usually used to analyze the spatial distribution pattern of geological structures and generally serve as a significant means for the identification of alteration zones. Based on the Landsat Enhanced Thematic Mapper (ETM+) data, which have better spectral resolution (8 bands) and spatial resolution (15 m in PAN band), the synthesis processing techniques were presented to fulfill alteration information extraction: data preparation, vegetation indices and band ratios, and expert classifier-based classification. These techniques have been implemented in the MapGIS-RSP software (version 1.0), developed by the Wuhan Zondy Cyber Technology Co., Ltd,China. In the study area application of extracting alteration information in the Zhaoyuan (招远) gold mines, Shandong (山东) Province, China, several hydorthermally altered zones (included two new sites) were found after satellite imagery interpretation coupled with field surveys. It is concluded that these synthesis processing techniques are useful approaches and are applicable to a wide range of gold-mineralized alteration information extraction.

  1. EXTRACTING KNOWLEDGE FROM DATA - DATA MINING

    Directory of Open Access Journals (Sweden)

    DIANA ELENA CODREANU

    2011-04-01

    Full Text Available Managers of economic organizations have at their disposal a large volume of information and practically facing an avalanche of information, but they can not operate studying reports containing detailed data volumes without a correlation because of the good an organization may be decided in fractions of time. Thus, to take the best and effective decisions in real time, managers need to have the correct information is presented quickly, in a synthetic way, but relevant to allow for predictions and analysis.This paper wants to highlight the solutions to extract knowledge from data, namely data mining. With this technology not only has to verify some hypotheses, but aims at discovering new knowledge, so that economic organization to cope with fierce competition in the market.

  2. Extracting geothermal heat from mines

    Energy Technology Data Exchange (ETDEWEB)

    Ednie, H.

    2007-03-15

    In response to environmental concerns, research is underway to find alternative methods of generating energy, including the use of low-temperature geothermal heat from mines. Geothermal energy is the energy produced internally by radiogenic heat production and long-term cooling of the planet. Various applications can be used from this energy, including direct use for heating and electricity generation. The Earth/Mine Energy Resource Group (EMERG) at McGill University has worked on the development of alternative energies from both active and abandoned surface and underground mines. Geothermal heat from mines was once regarded as a benign energy source, particularly when compared to nuclear, oil, and coal. However, there is high potential for ground heat to be used as a sustainable solution to some energy requirements. EMERG's objective is to integrate alternate energy during the life of the mine, as well as after mine closure. Geothermal heat from mines will enable local communities to use this inexpensive source of energy for district heating of buildings, for drying food products, or for mining applications, such as heating deep oil sands deposits. Active or abandoned mines are ideal locations for geothermal systems. The first 100 metres underground is well suited for supply and storage of thermal energy. Due to the steady temperatures deep underground, geothermal sources are excellent fuels for heating and cooling systems. This article presented an example of a geothermal heat pump system used in Springhill Nova Scotia where Rock Can Am Ltd. is using floodwater from abandoned mines to heat and cool the company's facility at the site. The system produces annual savings of 600,000 kWh or $45,000 compared to conventional systems, proving that geothermal energy from abandoned or existing mines is a viable alternative energy source. Further efforts could result in it becoming a more effective and attractive option for the reclamation of abandoned mines

  3. Enterprise Human Resources Information Mining Based on Improved Apriori Algorithm

    Directory of Open Access Journals (Sweden)

    Lei He

    2013-05-01

    Full Text Available With the unceasing development of information and technology in today’s modern society, enterprises’ demand of human resources information mining is getting bigger and bigger. Based on the enterprise human resources information mining situation, this paper puts forward a kind of improved Apriori algorithm based model on the enterprise human resources information mining, this model introduced data mining technology and traditional Apriori algorithm, and improved on its basis, divided the association rules mining task of the original algorithm into two subtasks of producing frequent item sets and producing rule, using SQL technology to directly generating frequent item sets, and using the method of establishing chart to extract the information which are interested to customers. The experimental results show that the improved Apriori algorithm based model on the enterprise human resources information mining is better in efficiency than the original algorithm, and the practical application test results show that the improved algorithm is practical and effective.

  4. Information extraction system

    Science.gov (United States)

    Lemmond, Tracy D; Hanley, William G; Guensche, Joseph Wendell; Perry, Nathan C; Nitao, John J; Kidwell, Paul Brandon; Boakye, Kofi Agyeman; Glaser, Ron E; Prenger, Ryan James

    2014-05-13

    An information extraction system and methods of operating the system are provided. In particular, an information extraction system for performing meta-extraction of named entities of people, organizations, and locations as well as relationships and events from text documents are described herein.

  5. Features information extraction of the mining area based on CBERS-02B%基于CBERS-02B的矿区地物信息的提取

    Institute of Scientific and Technical Information of China (English)

    王飞红; 任晓敏

    2013-01-01

    Object-oriented classification method is used to extract information from the Pingshuo surface coal mine in Shanxi Province. China-Brazil Earth Resources Satel ite(CBERS-02B)is used as data sources.The multi-scale segmentation and partitions level are created by object-oriented classification method.Through comparison of different segmentation results,the final segmentation scales are obtained.And spectral and spatial characteristics of the specific surface features are used to classify the image into vegetation, roads,mine construction,coal pile,mining face,waste rock dumps through the fuzzy classification of the membership function classification method.In the end,The classification result is evaluated by the error matrix,the result shows that the overal classification accuracy reaches 88.03%,and Kappa coefficient is 0.88.%应用面向对象分类方法,对山西省平朔露天煤矿进行信息提取。以中巴资源卫星CBERS-02B卫星遥感影像为数据源,利用面向对象分类方法进行多尺度分割并建立分割等级,通过对不同分割尺度的分割结果进行比较,获得最终分割尺度,并结合具体地物的光谱、空间等特征,采用模糊分类中的隶属度函数分类方法,最终将地物分为植被、道路、矿区建筑、煤堆、开采面、废石堆六类。最后使用误差矩阵对分类结果进行精度评价,其总体分类精度达到了88.63%,Kappa系数为0.89。

  6. Multimedia Information Extraction

    CERN Document Server

    Maybury, Mark T

    2012-01-01

    The advent of increasingly large consumer collections of audio (e.g., iTunes), imagery (e.g., Flickr), and video (e.g., YouTube) is driving a need not only for multimedia retrieval but also information extraction from and across media. Furthermore, industrial and government collections fuel requirements for stock media access, media preservation, broadcast news retrieval, identity management, and video surveillance.  While significant advances have been made in language processing for information extraction from unstructured multilingual text and extraction of objects from imagery and vid

  7. THE IDENTIFICATION OF PILL USING FEATURE EXTRACTION IN IMAGE MINING

    Directory of Open Access Journals (Sweden)

    A. Hema

    2015-02-01

    Full Text Available With the help of image mining techniques, an automatic pill identification system was investigated in this study for matching the images of the pills based on its several features like imprint, color, size and shape. Image mining is an inter-disciplinary task requiring expertise from various fields such as computer vision, image retrieval, image matching and pattern recognition. Image mining is the method in which the unusual patterns are detected so that both hidden and useful data images can only be stored in large database. It involves two different approaches for image matching. This research presents a drug identification, registration, detection and matching, Text, color and shape extraction of the image with image mining concept to identify the legal and illegal pills with more accuracy. Initially, the preprocessing process is carried out using novel interpolation algorithm. The main aim of this interpolation algorithm is to reduce the artifacts, blurring and jagged edges introduced during up-sampling. Then the registration process is proposed with two modules they are, feature extraction and corner detection. In feature extraction the noisy high frequency edges are discarded and relevant high frequency edges are selected. The corner detection approach detects the high frequency pixels in the intersection points. Through the overall performance gets improved. There is a need of segregate the dataset into groups based on the query image’s size, shape, color, text, etc. That process of segregating required information is called as feature extraction. The feature extraction is done using Geometrical Gradient feature transformation. Finally, color and shape feature extraction were performed using color histogram and geometrical gradient vector. Simulation results shows that the proposed techniques provide accurate retrieval results both in terms of time and accuracy when compared to conventional approaches.

  8. Analysis of Mining Terrain Deformation Characteristics with Deformation Information System

    Science.gov (United States)

    Blachowski, Jan; Milczarek, Wojciech; Grzempowski, Piotr

    2014-05-01

    Mapping and prediction of mining related deformations of the earth surface is an important measure for minimising threat to surface infrastructure, human population, the environment and safety of the mining operation itself arising from underground extraction of useful minerals. The number of methods and techniques used for monitoring and analysis of mining terrain deformations is wide and increasing with the development of geographical information technologies. These include for example: terrestrial geodetic measurements, global positioning systems, remote sensing, spatial interpolation, finite element method modelling, GIS based modelling, geological modelling, empirical modelling using the Knothe theory, artificial neural networks, fuzzy logic calculations and other. The aim of this paper is to introduce the concept of an integrated Deformation Information System (DIS) developed in geographic information systems environment for analysis and modelling of various spatial data related to mining activity and demonstrate its applications for mapping and visualising, as well as identifying possible mining terrain deformation areas with various spatial modelling methods. The DIS concept is based on connected modules that include: the spatial database - the core of the system, the spatial data collection module formed by: terrestrial, satellite and remote sensing measurements of the ground changes, the spatial data mining module for data discovery and extraction, the geological modelling module, the spatial data modeling module with data processing algorithms for spatio-temporal analysis and mapping of mining deformations and their characteristics (e.g. deformation parameters: tilt, curvature and horizontal strain), the multivariate spatial data classification module and the visualization module allowing two-dimensional interactive and static mapping and three-dimensional visualizations of mining ground characteristics. The Systems's functionality has been presented on

  9. Semi-structured Data Extraction and Schema Knowledge Mining

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    A semi-structured data extraction method to get the us eful information embedded in a group of relevant web pages and store it with OE M(Object Exchange Model) is proposed. Then, the data mining method is adopted t o discover schema k nowledge implicit in the semi-structured data. This knowledge can make users un derstand the information structure on the web more deeply and thourouly. At the same time, it can also provide a kind of effective schema for the querying of we b information.

  10. Validity of association rules extracted by healthcare-data-mining.

    Science.gov (United States)

    Takeuchi, Hiroshi; Kodama, Naoki

    2014-01-01

    A personal healthcare system used with cloud computing has been developed. It enables a daily time-series of personal health and lifestyle data to be stored in the cloud through mobile devices. The cloud automatically extracts personally useful information, such as rules and patterns concerning the user's lifestyle and health condition embedded in their personal big data, by using healthcare-data-mining. This study has verified that the extracted rules on the basis of a daily time-series data stored during a half- year by volunteer users of this system are valid.

  11. A Survey on Web Text Information Retrieval in Text Mining

    Directory of Open Access Journals (Sweden)

    Tapaswini Nayak

    2015-08-01

    Full Text Available In this study we have analyzed different techniques for information retrieval in text mining. The aim of the study is to identify web text information retrieval. Text mining almost alike to analytics, which is a process of deriving high quality information from text. High quality information is typically derived in the course of the devising of patterns and trends through means such as statistical pattern learning. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, creation of coarse taxonomies, sentiment analysis, document summarization and entity relation modeling. It is used to mine hidden information from not-structured or semi-structured data. This feature is necessary because a large amount of the Web information is semi-structured due to the nested structure of HTML code, is linked and is redundant. Web content categorization with a content database is the most important tool to the efficient use of search engines. A customer requesting information on a particular subject or item would otherwise have to search through hundred of results to find the most relevant information to his query. Hundreds of results through use of mining text are reduced by this step. This eliminates the aggravation and improves the navigation of information on the Web.

  12. Mars Target Encyclopedia: Information Extraction for Planetary Science

    Science.gov (United States)

    Wagstaff, K. L.; Francis, R.; Gowda, T.; Lu, Y.; Riloff, E.; Singh, K.

    2017-06-01

    Mars surface targets / and published compositions / Seek and ye will find. We used text mining methods to extract information from LPSC abstracts about the composition of Mars surface targets. Users can search by element, mineral, or target.

  13. 面向对象的矿区信息提取方法的应用与研究%The Application and the Research of Object-oriented Method for Extraction of Mining Area Information

    Institute of Scientific and Technical Information of China (English)

    袁定波; 刘成林; 汪国斌

    2013-01-01

    通过使用分辨率为30m的TM影像作为基础数据和面向对象的遥感软件eCognition作为提取工具,采用面向对象的遥感信息提取方法提取矿区的信息,结合ArcGIS的空间分析功能对其产生的分类结果进行进一步的改进及优化,可得到较准确的分类结果矢量图.实验结果表明:利用基于多尺度分割的面向对象的分类方法可以有效地避免传统的基于像元分类时出现的“椒盐”现象,可得到精度更高,范围更准确的分类结果.实验结论为矿区信息提取及矿区影响研究提供了一种新的思路.%By using the TM images with a resolution of 30 meters as the basic data and the object-oriented software——eCognition as information extraction tools, this experiment tried to extract mining area information with the object-oriented method. It also combined the spatial analysis function of ArcGIS for further improvement and optimization classification results. Experimental results showed that:object-oriented classification method based on multi-scale segmentation can effectively avoid "salt and pepper" phenomenon,and can reach more accurate results with precise boundaries. Experiment results concluded a new idea about the study of mining information extraction and the mining area impact.

  14. Mining information from atom probe data.

    Science.gov (United States)

    Cairney, Julie M; Rajan, Krishna; Haley, Daniel; Gault, Baptiste; Bagot, Paul A J; Choi, Pyuck-Pa; Felfer, Peter J; Ringer, Simon P; Marceau, Ross K W; Moody, Michael P

    2015-12-01

    Whilst atom probe tomography (APT) is a powerful technique with the capacity to gather information containing hundreds of millions of atoms from a single specimen, the ability to effectively use this information creates significant challenges. The main technological bottleneck lies in handling the extremely large amounts of data on spatial-chemical correlations, as well as developing new quantitative computational foundations for image reconstruction that target critical and transformative problems in materials science. The power to explore materials at the atomic scale with the extraordinary level of sensitivity of detection offered by atom probe tomography has not been not fully harnessed due to the challenges of dealing with missing, sparse and often noisy data. Hence there is a profound need to couple the analytical tools to deal with the data challenges with the experimental issues associated with this instrument. In this paper we provide a summary of some key issues associated with the challenges, and solutions to extract or "mine" fundamental materials science information from that data.

  15. Analytical Study of Feature Extraction Techniques in Opinion Mining

    Directory of Open Access Journals (Sweden)

    Pravesh Kumar Singh

    2013-07-01

    Full Text Available Although opinion mining is in a nascent stage of de velopment but still the ground is set for dense growth of researches in the field. One of the important activities of opinion mining is to extract opinions of people based on characteristics of the object under study. Feature extraction in opinion mining can be done by various ways like that of clustering, support vector machines etc. This paper is an attempt to appraise the vario us techniques of feature extraction. The first part discusses various techniques and second part m akes a detailed appraisal of the major techniques used for feature extraction.

  16. Extracting mining subsidence land from remote sensing images based on domain knowledge

    Institute of Scientific and Technical Information of China (English)

    WANG Xing-feng; WANG Yun-jia; HUANG Tai

    2008-01-01

    Extracting mining subsidence land from RS images is one of important research contents for environment monitoring in mining area. The accuracy of traditional extracting models based on spectral features is low. In order to extract subsidence land from RS images with high accuracy, some domain knowledge should be imported and new models should be proposed. This paper, in terms of the disadvantage of traditional extracting models, imports domain knowledge from practice and experience, converts semantic knowledge into digital information, and proposes a new model for the specific task. By selecting Luan mining area as study area, this new model is tested based on GIS and related knowledge. The result shows that the proposed method is more pre- cise than traditional methods and can satisfy the demands of land subsidence monitoring in mining area.

  17. Social big data mining

    CERN Document Server

    Ishikawa, Hiroshi

    2015-01-01

    Social Media. Big Data and Social Data. Hypotheses in the Era of Big Data. Social Big Data Applications. Basic Concepts in Data Mining. Association Rule Mining. Clustering. Classification. Prediction. Web Structure Mining. Web Content Mining. Web Access Log Mining, Information Extraction and Deep Web Mining. Media Mining. Scalability and Outlier Detection.

  18. Sustainable rehabilitation of mining waste and acid mine drainage using geochemistry, mine type, mineralogy, texture, ore extraction and climate knowledge.

    Science.gov (United States)

    Anawar, Hossain Md

    2015-08-01

    The oxidative dissolution of sulfidic minerals releases the extremely acidic leachate, sulfate and potentially toxic elements e.g., As, Ag, Cd, Cr, Cu, Hg, Ni, Pb, Sb, Th, U, Zn, etc. from different mine tailings and waste dumps. For the sustainable rehabilitation and disposal of mining waste, the sources and mechanisms of contaminant generation, fate and transport of contaminants should be clearly understood. Therefore, this study has provided a critical review on (1) recent insights in mechanisms of oxidation of sulfidic minerals, (2) environmental contamination by mining waste, and (3) remediation and rehabilitation techniques, and (4) then developed the GEMTEC conceptual model/guide [(bio)-geochemistry-mine type-mineralogy- geological texture-ore extraction process-climatic knowledge)] to provide the new scientific approach and knowledge for remediation of mining wastes and acid mine drainage. This study has suggested the pre-mining geological, geochemical, mineralogical and microtextural characterization of different mineral deposits, and post-mining studies of ore extraction processes, physical, geochemical, mineralogical and microbial reactions, natural attenuation and effect of climate change for sustainable rehabilitation of mining waste. All components of this model should be considered for effective and integrated management of mining waste and acid mine drainage.

  19. Mining the Temporal Dimension of the Information Propagation

    Science.gov (United States)

    Berlingerio, Michele; Coscia, Michele; Giannotti, Fosca

    In the last decade, Social Network Analysis has been a field in which the effort devoted from several researchers in the Data Mining area has increased very fast. Among the possible related topics, the study of the information propagation in a network attracted the interest of many researchers, also from the industrial world. However, only a few answers to the questions “How does the information propagates over a network, why and how fast?” have been discovered so far. On the other hand, these answers are of large interest, since they help in the tasks of finding experts in a network, assessing viral marketing strategies, identifying fast or slow paths of the information inside a collaborative network. In this paper we study the problem of finding frequent patterns in a network with the help of two different techniques: TAS (Temporally Annotated Sequences) mining, aimed at extracting sequential patterns where each transition between two events is annotated with a typical transition time that emerges from input data, and Graph Mining, which is helpful for locally analyzing the nodes of the networks with their properties. Finally we show preliminary results done in the direction of mining the information propagation over a network, performed on two well known email datasets, that show the power of the combination of these two approaches.

  20. Mining Hesitation Information by Vague Association Rules

    Science.gov (United States)

    Lu, An; Ng, Wilfred

    In many online shopping applications, such as Amazon and eBay, traditional Association Rule (AR) mining has limitations as it only deals with the items that are sold but ignores the items that are almost sold (for example, those items that are put into the basket but not checked out). We say that those almost sold items carry hesitation information, since customers are hesitating to buy them. The hesitation information of items is valuable knowledge for the design of good selling strategies. However, there is no conceptual model that is able to capture different statuses of hesitation information. Herein, we apply and extend vague set theory in the context of AR mining. We define the concepts of attractiveness and hesitation of an item, which represent the overall information of a customer's intent on an item. Based on the two concepts, we propose the notion of Vague Association Rules (VARs). We devise an efficient algorithm to mine the VARs. Our experiments show that our algorithm is efficient and the VARs capture more specific and richer information than do the traditional ARs.

  1. Mine railway equipments management information system

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, X.; Han, K.; Duan, T.; Liu, Z.; Lu, H. [China University of Mining and Technology, Xuzhou (China)

    2007-06-15

    Based on client/server and browser/server models, the management information system described realized the entire life-cycle management of mine railway equipment which included universal equipment and special equipment in the locomotive depot, track maintenance division, electrical depot and car depot. The system has other online functions such as transmitting reports, graphics management, statistics, searches, graphics wizard and web propaganda. It was applied in Pingdingshan Coal Co. Ltd.'s Railway Transport Department. 5 refs., 4 figs.

  2. Data Mining for Security Information: A Survey

    Energy Technology Data Exchange (ETDEWEB)

    Brugger, S T; Kelley, M; Sumikawa, K; Wakumoto, S

    2001-04-19

    This paper will present a survey of the current published work and products available to do off-line data mining for computer network security information. Hundreds of megabytes of data are collected every second that are of interest to computer security professionals. This data can answer questions ranging from the proactive, ''Which machines are the attackers going to try to compromise?'' to the reactive, ''When did the intruder break into my system and how?'' Unfortunately, there's so much data that computer security professionals don't have time to sort through it all. What we need are systems that perform data mining at various levels on this corpus of data in order to ease the burden of the human analyst. Such systems typically operate on log data produced by hosts, firewalls and intrusion detection systems as such data is typically in a standard, machine readable format and usually provides information that is most relevant to the security of the system. Systems that do this type of data mining for security information fall under the classification of intrusion detection systems. It is important to point out that we are not surveying real-time intrusion detection systems. Instead, we examined what is possible when the analysis is done off-line. Doing the analysis off-line allows for a larger amount of data correlation between distant sites who transfer relevant log files periodically and may be able to take greater advantage of an archive of past logs. Such a system is not a replacement for a real-time intrusion detection system but should be used in conjunction with one. In fact, as noted previously, the logs of the real-time IDS may be one of the inputs to the data mining system. We will concentrate on the application of data mining to network connection data, as opposed to system logs or the output of real-time intrusion detection systems. We do this primarily because this data is readily obtained from

  3. A Mining Algorithm for Extracting Decision Process Data Models

    Directory of Open Access Journals (Sweden)

    Cristina-Claudia DOLEAN

    2011-01-01

    Full Text Available The paper introduces an algorithm that mines logs of user interaction with simulation software. It outputs a model that explicitly shows the data perspective of the decision process, namely the Decision Data Model (DDM. In the first part of the paper we focus on how the DDM is extracted by our mining algorithm. We introduce it as pseudo-code and, then, provide explanations and examples of how it actually works. In the second part of the paper, we use a series of small case studies to prove the robustness of the mining algorithm and how it deals with the most common patterns we found in real logs.

  4. Developments of spatial information-based Digital Mine in China

    Institute of Scientific and Technical Information of China (English)

    WU Li-xin; CHE De-fu

    2008-01-01

    Gave a brief introduction to the origin, concepted and hierarchical structure ofDigital Mine. As a huge complex system, Digital Mine took data base and model base to-gether as a mine data management system being its core, and Digital Mine was com-prised of five subsystems including data obtaining system, integral dispatching system,applied engineering system, data processing system, and data management system. Be-ing a digitally 3D visualized representation and a spatial information infrastructure of anactual mine, Digital Mine had three basic features such as data warehouse, informationreference and digital platform. The present developments of Digital Mine in mining industry,research and education were also introduced. Examples were shown for present DigitalMine construction in China. The development trends, the key technologies and the recentconstruction procedures on Digital Mine were presented.

  5. MONITORING OF COAL BED EXTRACTION AS AN EFFECTIVE TOOL FOR IMPROVING THE PRODUCTION RESULTS OF A MINE

    Directory of Open Access Journals (Sweden)

    Witold BIAŁY

    2015-07-01

    Full Text Available The basic source of information necessary for proper and effective management of a hard coal mine is continuous monitoring of the mining process. An increased number of machines and devices used in a mine caused a need for continuous monitoring of mining departments’ work. Monitoring of the extraction of hard coal beds is crucial for this process management, as it determines the proper course of the mining process. Hence, monitoring can be considered the most important element of the controlling process, especially in the area of mining process management in a mine. Effective monitoring and proper, quick reacting to any irregularities in this process have a significant influence on the production results of a mine.

  6. A Financial Data Mining Model for Extracting Customer Behavior

    Directory of Open Access Journals (Sweden)

    Mark K.Y. Mak

    2011-08-01

    Full Text Available Facing the problem of variation and chaotic behavior of customers, the lack of sufficient information is a challenge to many business organizations. Human analysts lacking an understanding of the hidden patterns in business data, thus, can miss corporate business opportunities. In order to embrace all business opportunities, enhance the competitiveness, discovery of hidden knowledge, unexpected patterns and useful rules from large databases have provided a feasible solution for several decades. While there is a wide range of financial analysis products existing in the financial market, how to customize the investment portfolio for the customer is still a challenge to many financial institutions. This paper aims at developing an intelligent Financial Data Mining Model (FDMM for extracting customer behavior in the financial industry, so as to increase the availability of decision support data and hence increase customer satisfaction. The proposed financial model first clusters the customers into several sectors, and then finds the correlation among these sectors. It is noted that better customer segmentation can increase the ability to identify targeted customers, therefore extracting useful rules for specific clusters can provide an insight into customers' buying behavior and marketing implications. To validate the feasibility of the proposed model, a simple dataset is collected from a financial company in Hong Kong. The simulation experiments show that the proposed method not only can improve the workflow of a financial company, but also deepen understanding of investment behavior. Thus, a corporation is able to customize the most suitable products and services for customers on the basis of the rules extracted.

  7. Information Extraction From Chemical Patents

    Directory of Open Access Journals (Sweden)

    Sandra Bergmann

    2012-01-01

    Full Text Available The development of new chemicals or pharmaceuticals is preceded by an indepth analysis of published patents in this field. This information retrieval is a costly and time inefficient step when done by a human reader, yet it is mandatory for potential success of an investment. The goal of the research project UIMA-HPC is to automate and hence speed-up the process of knowledge mining about patents. Multi-threaded analysis engines, developed according to UIMA (Unstructured Information Management Architecture standards, process texts and images in thousands of documents in parallel. UNICORE (UNiform Interface to COmputing Resources workflow control structures make it possible to dynamically allocate resources for every given task to gain best cpu-time/realtime ratios in an HPC environment.

  8. A REVIEW ON TEXT MINING IN DATA MINING

    OpenAIRE

    2016-01-01

    Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from large amounts of data. The important term in data mining is text mining. Text mining extracts the quality information highly from text. Statistical pattern learning is used to high quality information. High –quality in text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text categorization, text clustering, entity extraction and sentim...

  9. Extracting useful information from images

    DEFF Research Database (Denmark)

    Kucheryavskiy, Sergey

    2011-01-01

    The paper presents an overview of methods for extracting useful information from digital images. It covers various approaches that utilized different properties of images, like intensity distribution, spatial frequencies content and several others. A few case studies including isotropic...... and heterogeneous, congruent and non-congruent images are used to illustrate how the described methods work and to compare some of them...

  10. Image Information Mining System Evaluation Using Information-Theoretic Measures

    Directory of Open Access Journals (Sweden)

    Mihai Datcu

    2005-08-01

    Full Text Available During the last decade, the exponential increase of multimedia and remote sensing image archives, the fast expansion of the world wide web, and the high diversity of users have yielded concepts and systems for successful content-based image retrieval and image information mining. Image data information systems require both database and visual capabilities, but there is a gap between these systems. Database systems usually do not deal with multidimensional pictorial structures and vision systems do not provide database query functions. In terms of these points, the evaluation of content-based image retrieval systems became a focus of research interest. One can find several system evaluation approaches in literature, however, only few of them go beyond precision-recall graphs and do not allow a detailed evaluation of an interactive image retrieval system. Apart from the existing evaluation methodologies, we aim at the overall validation of our knowledge-driven content-based image information mining system. In this paper, an evaluation approach is demonstrated that is based on information-theoretic quantities to determine the information flow between system levels of different semantic abstraction and to analyze human-computer interactions.

  11. Image Information Mining System Evaluation Using Information-Theoretic Measures

    Science.gov (United States)

    Daschiel, Herbert; Datcu, Mihai

    2005-12-01

    During the last decade, the exponential increase of multimedia and remote sensing image archives, the fast expansion of the world wide web, and the high diversity of users have yielded concepts and systems for successful content-based image retrieval and image information mining. Image data information systems require both database and visual capabilities, but there is a gap between these systems. Database systems usually do not deal with multidimensional pictorial structures and vision systems do not provide database query functions. In terms of these points, the evaluation of content-based image retrieval systems became a focus of research interest. One can find several system evaluation approaches in literature, however, only few of them go beyond precision-recall graphs and do not allow a detailed evaluation of an interactive image retrieval system. Apart from the existing evaluation methodologies, we aim at the overall validation of our knowledge-driven content-based image information mining system. In this paper, an evaluation approach is demonstrated that is based on information-theoretic quantities to determine the information flow between system levels of different semantic abstraction and to analyze human-computer interactions.

  12. PALM-IST: Pathway Assembly from Literature Mining - an Information Search Tool

    Science.gov (United States)

    Mandloi, Sapan; Chakrabarti, Saikat

    2015-01-01

    Manual curation of biomedical literature has become extremely tedious process due to its exponential growth in recent years. To extract meaningful information from such large and unstructured text, newer and more efficient mining tool is required. Here, we introduce PALM-IST, a computational platform that not only allows users to explore biomedical abstracts using keyword based text mining but also extracts biological entity (e.g., gene/protein, drug, disease, biological processes, cellular component, etc.) information from the extracted text and subsequently mines various databases to provide their comprehensive inter-relation (e.g., interaction, expression, etc.). PALM-IST constructs protein interaction network and pathway information data relevant to the text search using multiple data mining tools and assembles them to create a meta-interaction network. It also analyzes scientific collaboration by extraction and creation of “co-authorship network,” for a given search context. Hence, this useful combination of literature and data mining provided in PALM-IST can be used to extract novel protein-protein interaction (PPI), to generate meta-pathways and further to identify key crosstalk and bottleneck proteins. PALM-IST is available at www.hpppi.iicb.res.in/ctm. PMID:25989388

  13. Mining chemical structural information from the drug literature.

    Science.gov (United States)

    Banville, Debra L

    2006-01-01

    It is easier to find too many documents on a life science topic than to find the right information inside these documents. With the application of text data mining to biological documents, it is no surprise that researchers are starting to look at applications that mine out chemical information. The mining of chemical entities--names and structures--brings with it some unique challenges, which commercial and academic efforts are beginning to address. Ultimately, life science text data mining applications need to focus on the marriage of biological and chemical information.

  14. Informed consent in dental extractions.

    Directory of Open Access Journals (Sweden)

    José Luis Capote Femenías

    2009-07-01

    Full Text Available When performing any oral intervention, particularly dental extractions, the specialist should have the oral or written consent of the patient. This consent includes the explanation of all possible complications, whether typical, very serious or personalized associated with the previous health condition, age, profession, religion or any other characteristic of the patient, as well as the possi.ble benefits of the intervention. This article is related with the bioethical aspects related with dental extractions, in order to determine the main elements that the informed consent should include.

  15. Mining chemical information from open patents

    Directory of Open Access Journals (Sweden)

    Jessop David M

    2011-10-01

    Full Text Available Abstract Linked Open Data presents an opportunity to vastly improve the quality of science in all fields by increasing the availability and usability of the data upon which it is based. In the chemical field, there is a huge amount of information available in the published literature, the vast majority of which is not available in machine-understandable formats. PatentEye, a prototype system for the extraction and semantification of chemical reactions from the patent literature has been implemented and is discussed. A total of 4444 reactions were extracted from 667 patent documents that comprised 10 weeks' worth of publications from the European Patent Office (EPO, with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra reported as product characterisation data are additionally captured.

  16. Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.

    Science.gov (United States)

    Ravikumar, Komandur Elayavilli; Wagholikar, Kavishwar B; Li, Dingcheng; Kocher, Jean-Pierre; Liu, Hongfang

    2015-06-06

    Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3% for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10% in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5%. Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating

  17. MBA: a literature mining system for extracting biomedical abbreviations

    Directory of Open Access Journals (Sweden)

    Lei YiMing

    2009-01-01

    Full Text Available Abstract Background The exploding growth of the biomedical literature presents many challenges for biological researchers. One such challenge is from the use of a great deal of abbreviations. Extracting abbreviations and their definitions accurately is very helpful to biologists and also facilitates biomedical text analysis. Existing approaches fall into four broad categories: rule based, machine learning based, text alignment based and statistically based. State of the art methods either focus exclusively on acronym-type abbreviations, or could not recognize rare abbreviations. We propose a systematic method to extract abbreviations effectively. At first a scoring method is used to classify the abbreviations into acronym-type and non-acronym-type abbreviations, and then their corresponding definitions are identified by two different methods: text alignment algorithm for the former, statistical method for the latter. Results A literature mining system MBA was constructed to extract both acronym-type and non-acronym-type abbreviations. An abbreviation-tagged literature corpus, called Medstract gold standard corpus, was used to evaluate the system. MBA achieved a recall of 88% at the precision of 91% on the Medstract gold-standard EVALUATION Corpus. Conclusion We present a new literature mining system MBA for extracting biomedical abbreviations. Our evaluation demonstrates that the MBA system performs better than the others. It can identify the definition of not only acronym-type abbreviations including a little irregular acronym-type abbreviations (e.g., , but also non-acronym-type abbreviations (e.g., .

  18. Vocabulary Mining for Information Retrieval: Rough Sets and Fuzzy Sets.

    Science.gov (United States)

    Srinivasan, Padmini; Ruiz, Miguel E.; Kraft, Donald H.; Chen, Jianhua

    2001-01-01

    Explains vocabulary mining in information retrieval and describes a framework for vocabulary mining that allows the use of rough set-based approximations even when documents and queries are described using weighted, or fuzzy, representations. Examines coordination between multiple vocabulary views and applies the framework to the Unified Medical…

  19. Web-Based Information Extraction Technology

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Information extraction techniques on the Web are the current research hotspot. Now many information extraction techniques based on different principles have appeared and have different capabilities. We classify the existing information extraction techniques by the principle of information extraction and analyze the methods and principles of semantic information adding, schema defining,rule expression, semantic items locating and object locating in the approaches. Based on the above survey and analysis,several open problems are discussed.

  20. Web Mining: Penning an Era of Information Age

    OpenAIRE

    Anshika Goel; Dinesh Sahu; Manish Kumar

    2014-01-01

    Today's age is rightly pronounced as "Information Age" which stands on the edifice of Information Technology and is operated by the Internet through the concept of web mining and is maintained & evolved through the high-speed technology of cloud computing. In short, if we try to summarize the situation, we would find that web mining concept has fuelled the entire process. This paper is an attempt to put light on the aspect of how web mining has penned the information age by co...

  1. Issues in Data Mining and Information Retrieval

    Directory of Open Access Journals (Sweden)

    Ammar Yassir

    2012-02-01

    Full Text Available Data mining, as we use the term, is the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules. For the purposes of this book, we assume that the goal of data mining is to allow a corporation to improve its marketing, sales, and customer support operations through a better understanding of its customers. Keep in mind, however, that the data mining techniques and tools described here are equally applicable in fields ranging from law enforcement to radio astronomy, medicine, and industrial process control. In fact, hardly any of the data mining algorithms were first invented with commercial applications in mind. The commercial data miner employs a grab bag of techniques borrowed from statistics, computer science, and machine learning research. The choice of a particular combination of techniques to apply in a particular situation depends on the nature of the data mining task, the nature of the available data, and the skills and preferences of the data miner. Data mining is largely concerned with building models. A model is simply an algorithm or set of rules that connects a collection of inputs (often in the form of fields in a corporate database to a particular target or outcome

  2. Information Extraction and Webpage Understanding

    Directory of Open Access Journals (Sweden)

    M.Sharmila Begum

    2011-11-01

    Full Text Available The two most important tasks in information extraction from the Web are webpage structure understanding and natural language sentences processing. However, little work has been done toward an integrated statistical model for understanding webpage structures and processing natural language sentences within the HTML elements. Our recent work on webpage understanding introduces a joint model of Hierarchical Conditional Random Fields (HCRFs and extended Semi-Markov Conditional Random Fields (Semi-CRFs to leverage the page structure understanding results in free text segmentation and labeling. In this top-down integration model, the decision of the HCRF model could guide the decision making of the Semi-CRF model. However, the drawback of the topdown integration strategy is also apparent, i.e., the decision of the Semi-CRF model could not be used by the HCRF model to guide its decision making. This paper proposed a novel framework called WebNLP, which enables bidirectional integration of page structure understanding and text understanding in an iterative manner. We have applied the proposed framework to local business entity extraction and Chinese person and organization name extraction. Experiments show that the WebNLP framework achieved significantly better performance than existing methods.

  3. Vaccine adverse event text mining system for extracting features from vaccine safety reports.

    Science.gov (United States)

    Botsis, Taxiarchis; Buttolph, Thomas; Nguyen, Michael D; Winiecki, Scott; Woo, Emily Jane; Ball, Robert

    2012-01-01

    To develop and evaluate a text mining system for extracting key clinical features from vaccine adverse event reporting system (VAERS) narratives to aid in the automated review of adverse event reports. Based upon clinical significance to VAERS reviewing physicians, we defined the primary (diagnosis and cause of death) and secondary features (eg, symptoms) for extraction. We built a novel vaccine adverse event text mining (VaeTM) system based on a semantic text mining strategy. The performance of VaeTM was evaluated using a total of 300 VAERS reports in three sequential evaluations of 100 reports each. Moreover, we evaluated the VaeTM contribution to case classification; an information retrieval-based approach was used for the identification of anaphylaxis cases in a set of reports and was compared with two other methods: a dedicated text classifier and an online tool. The performance metrics of VaeTM were text mining metrics: recall, precision and F-measure. We also conducted a qualitative difference analysis and calculated sensitivity and specificity for classification of anaphylaxis cases based on the above three approaches. VaeTM performed best in extracting diagnosis, second level diagnosis, drug, vaccine, and lot number features (lenient F-measure in the third evaluation: 0.897, 0.817, 0.858, 0.874, and 0.914, respectively). In terms of case classification, high sensitivity was achieved (83.1%); this was equal and better compared to the text classifier (83.1%) and the online tool (40.7%), respectively. Our VaeTM implementation of a semantic text mining strategy shows promise in providing accurate and efficient extraction of key features from VAERS narratives.

  4. Data Visualization in Information Retrieval and Data Mining (SIG VIS).

    Science.gov (United States)

    Efthimiadis, Efthimis

    2000-01-01

    Presents abstracts that discuss using data visualization for information retrieval and data mining, including immersive information space and spatial metaphors; spatial data using multi-dimensional matrices with maps; TREC (Text Retrieval Conference) experiments; users' information needs in cartographic information retrieval; and users' relevance…

  5. Extracting information from multiplex networks.

    Science.gov (United States)

    Iacovacci, Jacopo; Bianconi, Ginestra

    2016-06-01

    Multiplex networks are generalized network structures that are able to describe networks in which the same set of nodes are connected by links that have different connotations. Multiplex networks are ubiquitous since they describe social, financial, engineering, and biological networks as well. Extending our ability to analyze complex networks to multiplex network structures increases greatly the level of information that is possible to extract from big data. For these reasons, characterizing the centrality of nodes in multiplex networks and finding new ways to solve challenging inference problems defined on multiplex networks are fundamental questions of network science. In this paper, we discuss the relevance of the Multiplex PageRank algorithm for measuring the centrality of nodes in multilayer networks and we characterize the utility of the recently introduced indicator function Θ̃(S) for describing their mesoscale organization and community structure. As working examples for studying these measures, we consider three multiplex network datasets coming for social science.

  6. Extracting Information from Multiplex Networks

    CERN Document Server

    Iacovacci, Jacopo

    2016-01-01

    Multiplex networks are generalized network structures that are able to describe networks in which the same set of nodes are connected by links that have different connotations. Multiplex networks are ubiquitous since they describe social, financial, engineering and biological networks as well. Extending our ability to analyze complex networks to multiplex network structures increases greatly the level of information that is possible to extract from Big Data. For these reasons characterizing the centrality of nodes in multiplex networks and finding new ways to solve challenging inference problems defined on multiplex networks are fundamental questions of network science. In this paper we discuss the relevance of the Multiplex PageRank algorithm for measuring the centrality of nodes in multilayer networks and we characterize the utility of the recently introduced indicator function $\\widetilde{\\Theta}^{S}$ for describing their mesoscale organization and community structure. As working examples for studying thes...

  7. Extracting information from multiplex networks

    Science.gov (United States)

    Iacovacci, Jacopo; Bianconi, Ginestra

    2016-06-01

    Multiplex networks are generalized network structures that are able to describe networks in which the same set of nodes are connected by links that have different connotations. Multiplex networks are ubiquitous since they describe social, financial, engineering, and biological networks as well. Extending our ability to analyze complex networks to multiplex network structures increases greatly the level of information that is possible to extract from big data. For these reasons, characterizing the centrality of nodes in multiplex networks and finding new ways to solve challenging inference problems defined on multiplex networks are fundamental questions of network science. In this paper, we discuss the relevance of the Multiplex PageRank algorithm for measuring the centrality of nodes in multilayer networks and we characterize the utility of the recently introduced indicator function Θ ˜ S for describing their mesoscale organization and community structure. As working examples for studying these measures, we consider three multiplex network datasets coming for social science.

  8. Tagline: Information Extraction for Semi-Structured Text Elements in Medical Progress Notes

    Science.gov (United States)

    Finch, Dezon Kile

    2012-01-01

    Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in…

  9. A Survey on Semantic Focused Crawler For Mining Service Information

    OpenAIRE

    Thakor, Aneri; Singh, Dheeraj Kumar

    2015-01-01

    Focused Crawler play a very important role in field of web mining for extracting and indexing the web pages which are most relevance to the pre define topic.But heterogeneity, ubiquity and ambiguity are major issues in these web pages.Thus various semantic focused crawler used to an extract and annotate the web pages that retrieved according to semantic web technology to overcome the three issues. It is intent to survey of semantic focused crawler.

  10. PPInterFinder--a mining tool for extracting causal relations on human proteins from literature.

    Science.gov (United States)

    Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar

    2013-01-01

    One of the most common and challenging problem in biomedical text mining is to mine protein-protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder--a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. DATABASE URL: http://www.biomining-bu.in/ppinterfinder/

  11. What is the relationship of medical humanitarian organisations with mining and other extractive industries?

    Directory of Open Access Journals (Sweden)

    Philippe Calain

    Full Text Available Philippe Calain discusses the health and environmental hazards of extractive industries like mining and explores the tensions that arise when medical humanitarian organizations are called to intervene in emergencies involving the extractive sector.

  12. Image mining and Automatic Feature extraction from Remotely Sensed Image (RSI using Cubical Distance Methods

    Directory of Open Access Journals (Sweden)

    S.Sasikala

    2013-04-01

    Full Text Available Information processing and decision support system using image mining techniques is in advance drive with huge availability of remote sensing image (RSI. RSI describes inherent properties of objects by recording their natural reflectance in the electro-magnetic spectral (ems region. Information on such objects could be gathered by their color properties or their spectral values in various ems range in the form of pixels. Present paper explains a method of such information extraction using cubical distance method and subsequent results. Thismethod is one among the simpler in its approach and considers grouping of pixels on the basis of equal distance from a specified point in the image or selected pixel having definite attribute values (DN in different spectral layers of the RSI. The color distance and the occurrence pixel distance play a vital role in determining similarobjects as clusters aid in extracting features in the RSI domain.

  13. Personalized Web Services for Web Information Extraction

    CERN Document Server

    Jarir, Zahi; Erradi, Mahammed

    2011-01-01

    The field of information extraction from the Web emerged with the growth of the Web and the multiplication of online data sources. This paper is an analysis of information extraction methods. It presents a service oriented approach for web information extraction considering both web data management and extraction services. Then we propose an SOA based architecture to enhance flexibility and on-the-fly modification of web extraction services. An implementation of the proposed architecture is proposed on the middleware level of Java Enterprise Edition (JEE) servers.

  14. Mining

    Directory of Open Access Journals (Sweden)

    Khairullah Khan

    2014-09-01

    Full Text Available Opinion mining is an interesting area of research because of its applications in various fields. Collecting opinions of people about products and about social and political events and problems through the Web is becoming increasingly popular every day. The opinions of users are helpful for the public and for stakeholders when making certain decisions. Opinion mining is a way to retrieve information through search engines, Web blogs and social networks. Because of the huge number of reviews in the form of unstructured text, it is impossible to summarize the information manually. Accordingly, efficient computational methods are needed for mining and summarizing the reviews from corpuses and Web documents. This study presents a systematic literature survey regarding the computational techniques, models and algorithms for mining opinion components from unstructured reviews.

  15. A semantic model for multimodal data mining in healthcare information systems.

    Science.gov (United States)

    Iakovidis, Dimitris; Smailis, Christos

    2012-01-01

    Electronic health records (EHRs) are representative examples of multimodal/multisource data collections; including measurements, images and free texts. The diversity of such information sources and the increasing amounts of medical data produced by healthcare institutes annually, pose significant challenges in data mining. In this paper we present a novel semantic model that describes knowledge extracted from the lowest-level of a data mining process, where information is represented by multiple features i.e. measurements or numerical descriptors extracted from measurements, images, texts or other medical data, forming multidimensional feature spaces. Knowledge collected by manual annotation or extracted by unsupervised data mining from one or more feature spaces is modeled through generalized qualitative spatial semantics. This model enables a unified representation of knowledge across multimodal data repositories. It contributes to bridging the semantic gap, by enabling direct links between low-level features and higher-level concepts e.g. describing body parts, anatomies and pathological findings. The proposed model has been developed in web ontology language based on description logics (OWL-DL) and can be applied to a variety of data mining tasks in medical informatics. It utility is demonstrated for automatic annotation of medical data.

  16. 78 FR 35974 - Proposed Information Collection; Comment Request; Coal Mine Rescue Teams; Arrangements for...

    Science.gov (United States)

    2013-06-14

    ... Safety and Health Administration Proposed Information Collection; Comment Request; Coal Mine Rescue Teams... protecting the safety and health of miners. 30 CFR Part 49, Mine Rescue Teams, Subpart B--Mine Rescue Teams for Underground Coal Mines, sets standards related to the availability of mine rescue teams;...

  17. Multiple-Feature Extracting Modules Based Leak Mining System Design

    Science.gov (United States)

    Cho, Ying-Chiang; Pan, Jen-Yi

    2013-01-01

    Over the years, human dependence on the Internet has increased dramatically. A large amount of information is placed on the Internet and retrieved from it daily, which makes web security in terms of online information a major concern. In recent years, the most problematic issues in web security have been e-mail address leakage and SQL injection attacks. There are many possible causes of information leakage, such as inadequate precautions during the programming process, which lead to the leakage of e-mail addresses entered online or insufficient protection of database information, a loophole that enables malicious users to steal online content. In this paper, we implement a crawler mining system that is equipped with SQL injection vulnerability detection, by means of an algorithm developed for the web crawler. In addition, we analyze portal sites of the governments of various countries or regions in order to investigate the information leaking status of each site. Subsequently, we analyze the database structure and content of each site, using the data collected. Thus, we make use of practical verification in order to focus on information security and privacy through black-box testing. PMID:24453892

  18. An unsupervised text mining method for relation extraction from biomedical literature.

    Science.gov (United States)

    Quan, Changqin; Wang, Meng; Ren, Fuji

    2014-01-01

    The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1) Protein-protein interactions extraction, and (2) Gene-suicide association extraction. The evaluation of task (1) on the benchmark dataset (AImed corpus) showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene-suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.

  19. An unsupervised text mining method for relation extraction from biomedical literature.

    Directory of Open Access Journals (Sweden)

    Changqin Quan

    Full Text Available The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1 Protein-protein interactions extraction, and (2 Gene-suicide association extraction. The evaluation of task (1 on the benchmark dataset (AImed corpus showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene-suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.

  20. An Application for Data Preprocessing and Models Extractions in Web Usage Mining

    Directory of Open Access Journals (Sweden)

    Claudia Elena DINUCA

    2011-11-01

    Full Text Available Web servers worldwide generate a vast amount of information on web users’ browsing activities. Several researchers have studied these so-called clickstream or web access log data to better understand and characterize web users. The goal of this application is to analyze user behaviour by mining enriched web access log data. With the continued growth and proliferation of e-commerce, Web services, and Web-based information systems, the volumes of click stream and user data collected by Web-based organizations in their daily operations has reached astronomical proportions. This information can be exploited in various ways, such as enhancing the effectiveness of websites or developing directed web marketing campaigns. The discovered patterns are usually represented as collections of pages, objects, or re-sources that are frequently accessed by groups of users with common needs or interests. In this paper we will focus on displaying the way how it was implemented the application for data preprocessing and extracting different data models from web logs data, finding association as a data mining technique to extract potentially useful knowledge from web usage data. We find different data models navigation patterns by analysing the log files of the web-site. I implemented the application in Java using NetBeans IDE. For exemplification, I used the log files data from a commercial web site www.nice-layouts.com.

  1. Mining heterogeneous information networks principles and methodologies

    CERN Document Server

    Sun, Yizhou

    2012-01-01

    Real-world physical and abstract data objects are interconnected, forming gigantic, interconnected networks. By structuring these data objects and interactions between these objects into multiple types, such networks become semi-structured heterogeneous information networks. Most real-world applications that handle big data, including interconnected social media and social networks, scientific, engineering, or medical information systems, online e-commerce systems, and most database systems, can be structured into heterogeneous information networks. Therefore, effective analysis of large-scale

  2. Discovering the Hidden Secrets in Your Data - the Data Mining Approach to Information

    Directory of Open Access Journals (Sweden)

    Michael Lloyd-Williams

    1997-01-01

    Full Text Available Nowadays, digital information is relatively easy to capture and fairly inexpensive to store. The digital revolution has seen collections of data grow in size, and the complexity of the data therein increase. Advances in technology have resulted in our ability to meaningfully analyse and understand the data we gather lagging far behind our ability to capture and store these data . It is often the case that large collections of data, however well structured, conceal implicit patterns of information that cannot be readily detected by conventional analysis techniques . Such information may often be usefully analysed using a set of techniques referred to as knowledge discovery or data mining. These techniques essentially seek to build a better understanding of data, and in building characterisations of data that can be used as a basis for further analysis, extract value from volume. This paper describes a number of empirical studies of the use of the data mining approach to the analysis of health information.

  3. Identification of Contamination Information of Vegetation in Coal Mines Based on Hyperspectral Remote Sensing Data

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    The development and application of the hyper-spectral remote sensing (HRS) in the environment investigation and evaluation of coal mines have been discussed in detail. By using Hyperion HRS technology and field spectrum measuring and integrating traditional geological method as well as laboratory chemical measurement, the absorption spectrum features and the spectral variation rules of vegetation caused by coal mine waste piles were studied. Based on the spectral modeling methods and Vegetation Red Edge Parameter (VREP), the diagnose spectra information and spectral variation parameter were extracted, and the mapping methods of VREP were researched. The spatial distributions of contaminative vegetation have been quickly found out.This study has provided technical supports for the environment investigation and pollution management of coal mines.

  4. Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies

    Directory of Open Access Journals (Sweden)

    Jia-Fu Chang

    2013-01-01

    Full Text Available Background: In general, surgical pathology reviews report protein expression by tumors in a semi-quantitative manner, that is, -, -/+, +/-, +. At the same time, the experimental pathology literature provides multiple examples of precise expression levels determined by immunohistochemical (IHC tissue examination of populations of tumors. Natural language processing (NLP techniques enable the automated extraction of such information through text mining. We propose establishing a database linking quantitative protein expression levels with specific tumor classifications through NLP. Materials and Methods: Our method takes advantage of typical forms of representing experimental findings in terms of percentages of protein expression manifest by the tumor population under study. Characteristically, percentages are represented straightforwardly with the % symbol or as the number of positive findings of the total population. Such text is readily recognized using regular expressions and templates permitting extraction of sentences containing these forms for further analysis using grammatical structures and rule-based algorithms. Results: Our pilot study is limited to the extraction of such information related to lymphomas. We achieved a satisfactory level of retrieval as reflected in scores of 69.91% precision and 57.25% recall with an F-score of 62.95%. In addition, we demonstrate the utility of a web-based curation tool for confirming and correcting our findings. Conclusions: The experimental pathology literature represents a rich source of pathobiological information, which has been relatively underutilized. There has been a combinatorial explosion of knowledge within the pathology domain as represented by increasing numbers of immunophenotypes and disease subclassifications. NLP techniques support practical text mining techniques for extracting this knowledge and organizing it in forms appropriate for pathology decision support systems.

  5. Information- Theoretic Analysis for the Difficulty of Extracting Hidden Information

    Institute of Scientific and Technical Information of China (English)

    ZHANG Wei-ming; LI Shi-qu; CAO Jia; LIU Jiu-fen

    2005-01-01

    The difficulty of extracting hidden information,which is essentially a kind of secrecy, is analyzed by information-theoretic method. The relations between key rate, message rate, hiding capacity and difficulty of extraction are studied in the terms of unicity distance of stego-key, and the theoretic conclusion is used to analyze the actual extracting attack on Least Significant Bit(LSB) steganographic algorithms.

  6. Synthetic information prediction system for crisis mine based on GIS

    Institute of Scientific and Technical Information of China (English)

    Yuxin Ye; Ping Yu; Shi Wang; Shuisheng Ye

    2006-01-01

    Reserves of some kinds of the crisis mines will be lack now or from now on, because of lacking seriously reserves of mineral resources and the crisis of exploring bases in support. So that it is urgent to predict, appraise, development and utilize the replaceable resources of the crisis mines. The mineral resources prediction software system of synthetic information is intelligent GIS which is used to quantitative prediction of large-scale synthetic information mineral target. It takes the geological body and the mineral resource body as a unit. And it analyzes the ore deposit genesis and metallotect, knows the spatial distribution laws of the ore deposit and ore body, and establish the prospecting model based on the concept of establishing the three-dimensional space of a mine. This paper will primarily discuss some important problems as follows: the secondary development of various kinds of data(including geology, geophysical prospecting, geochemical prospecting and remote sensing, etc); process synthetically and establish the synthetic information interpretative map base; correspond prospecting model with synthetic information of ore deposit; divided into statistical units of metallogenic information synthetic anomalies based on the synthetic information anomalies of ore control, then research the metallogenic information variable of unit synthetically and make quantitative prediction according to choose the quantitative prediction math model which is suitable to the demands of large-scale precision; at last, finish the target area optimization of ore deposit (body).

  7. Compact Weighted Class Association Rule Mining using Information Gain

    CERN Document Server

    Ibrahim, S P Syed

    2011-01-01

    Weighted association rule mining reflects semantic significance of item by considering its weight. Classification constructs the classifier and predicts the new data instance. This paper proposes compact weighted class association rule mining method, which applies weighted association rule mining in the classification and constructs an efficient weighted associative classifier. This proposed associative classification algorithm chooses one non class informative attribute from dataset and all the weighted class association rules are generated based on that attribute. The weight of the item is considered as one of the parameter in generating the weighted class association rules. This proposed algorithm calculates the weight using the HITS model. Experimental results show that the proposed system generates less number of high quality rules which improves the classification accuracy.

  8. Application of the Deformation Information System for automated analysis and mapping of mining terrain deformations - case study from SW Poland

    Science.gov (United States)

    Blachowski, Jan; Grzempowski, Piotr; Milczarek, Wojciech; Nowacka, Anna

    2015-04-01

    Monitoring, mapping and modelling of mining induced terrain deformations are important tasks for quantifying and minimising threats that arise from underground extraction of useful minerals and affect surface infrastructure, human safety, the environment and security of the mining operation itself. The number of methods and techniques used for monitoring and analysis of mining terrain deformations is wide and expanding with the progress in geographical information technologies. These include for example: terrestrial geodetic measurements, Global Navigation Satellite Systems, remote sensing, GIS based modelling and spatial statistics, finite element method modelling, geological modelling, empirical modelling using e.g. the Knothe theory, artificial neural networks, fuzzy logic calculations and other. The presentation shows the results of numerical modelling and mapping of mining terrain deformations for two cases of underground mining sites in SW Poland, hard coal one (abandoned) and copper ore (active) using the functionalities of the Deformation Information System (DIS) (Blachowski et al, 2014 @ http://meetingorganizer.copernicus.org/EGU2014/EGU2014-7949.pdf). The functionalities of the spatial data modelling module of DIS have been presented and its applications in modelling, mapping and visualising mining terrain deformations based on processing of measurement data (geodetic and GNSS) for these two cases have been characterised and compared. These include, self-developed and implemented in DIS, automation procedures for calculating mining terrain subsidence with different interpolation techniques, calculation of other mining deformation parameters (i.e. tilt, horizontal displacement, horizontal strain and curvature), as well as mapping mining terrain categories based on classification of the values of these parameters as used in Poland. Acknowledgments. This work has been financed from the National Science Centre Project "Development of a numerical method of

  9. Enhanced Pattern Representation in Information Extraction

    Institute of Scientific and Technical Information of China (English)

    廖乐健; 曹元大; 张映波

    2004-01-01

    Traditional pattern representation in information extraction lack in the ability of representing domain-specific concepts and are therefore devoid of flexibility. To overcome these restrictions, an enhanced pattern representation is designed which includes ontological concepts, neighboring-tree structures and soft constraints. An information-extraction inference engine based on hypothesis-generation and conflict-resolution is implemented. The proposed technique is successfully applied to an information extraction system for Chinese-language query front-end of a job-recruitment search engine.

  10. Mining of hospital laboratory information systems

    DEFF Research Database (Denmark)

    Søeby, Karen; Jensen, Peter Bjødstrup; Werge, Thomas

    2015-01-01

    of hospital laboratory data as a source of information, we analyzed enzymatic plasma creatinine as a model analyte in two large pediatric hospital samples. Methods: Plasma creatinine measurements from 9700 children aged 0-18 years were obtained from hospital laboratory databases and partitioned into high......-resolution gender- and age-groups. Normal probability plots were used to deduce parameters of the normal distributions from healthy creatinine values in the mixed hospital datasets. Furthermore, temporal trajectories were generated from repeated measurements to examine developmental patterns in periods of changing...... in creatinine levels at different time points after birth and around the early teens, which challenges the establishment and usefulness of reference intervals in those age groups. Conclusions: The study documents that hospital laboratory data may inform on the developmental aspects of creatinine, on periods...

  11. Data Mining Research for Information Security

    Science.gov (United States)

    2016-01-29

    analysis using taint propagation on virtual machine monitor." In this paper , a method for dynamically interpreting semantics information using taint...Public Release 13.  SUPPLEMENTARY NOTES 14.  ABSTRACT Machine -learning and Ontology Assisted Assessment of Research Trends (MOAART) advances machine ...learning by developing and testing an ontology-based inferencing engine to filter, sort and rank abstracts in specific research areas. The MOAART reports

  12. An Agent Based System Framework for Mining Data Record Extraction from Search Engine Result Pages

    Directory of Open Access Journals (Sweden)

    Dr.K.L Shunmuganathan

    2012-04-01

    Full Text Available Nowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Information extraction (IE from semistructured Web documents plays an important role for a variety of information agents. In this paper, a framework of WebIE system with the help of the JADE platform is proposed to solve problems by non-visual automatic wrapper to extract data records from search engine results pages which contain important information for Meta search engine and computer users. It gives the idea about different agents used in WebIE and how the communication occurred between them and how to manage different agents. Multi Agent System (MAS provides an efficient way for communicating agents and it is decentralized. Prototype model is developed for the study purpose and how it is used to solve the complex problems arise into the WebIE. Our wrapper consists of a series of agent filter to detect and remove irrelevant data region from the web page. In this paper, we propose a highly effective and efficient algorithm for automatically mining result records from search engine responsepages.

  13. 30 CFR 905.783 - Underground mining permit applications-Minimum requirements for information on environmental...

    Science.gov (United States)

    2010-07-01

    ... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Underground mining permit applications-Minimum... MINING OPERATIONS WITHIN EACH STATE CALIFORNIA § 905.783 Underground mining permit applications—Minimum requirements for information on environmental resources. (a) Part 783 of this chapter, Underground Mining...

  14. 30 CFR 922.783 - Underground mining permit applications-minimum requirements for information on environmental...

    Science.gov (United States)

    2010-07-01

    ... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Underground mining permit applications-minimum... MINING OPERATIONS WITHIN EACH STATE MICHIGAN § 922.783 Underground mining permit applications—minimum requirements for information on environmental resources. Part 783 of this chapter, Underground Mining Permit...

  15. 30 CFR 903.783 - Underground mining permit applications-Minimum requirements for information on environmental...

    Science.gov (United States)

    2010-07-01

    ... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Underground mining permit applications-Minimum... MINING OPERATIONS WITHIN EACH STATE ARIZONA § 903.783 Underground mining permit applications—Minimum requirements for information on environmental resources. (a) Part 783 of this chapter, Underground Mining...

  16. 30 CFR 937.783 - Underground mining permit applications-minimum requirements for information on environmental...

    Science.gov (United States)

    2010-07-01

    ... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Underground mining permit applications-minimum... MINING OPERATIONS WITHIN EACH STATE OREGON § 937.783 Underground mining permit applications—minimum requirements for information on environmental resources. Part 783 of this chapter, Underground Mining Permit...

  17. 30 CFR 939.783 - Underground mining permit applications-minimum requirements for information on environmental...

    Science.gov (United States)

    2010-07-01

    ... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Underground mining permit applications-minimum... MINING OPERATIONS WITHIN EACH STATE RHODE ISLAND § 939.783 Underground mining permit applications—minimum requirements for information on environmental resources. Part 783 of this chapter, Underground Mining Permit...

  18. 30 CFR 912.783 - Underground mining permit applications-minimum requirements for information on environmental...

    Science.gov (United States)

    2010-07-01

    ... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Underground mining permit applications-minimum... MINING OPERATIONS WITHIN EACH STATE IDAHO § 912.783 Underground mining permit applications—minimum requirements for information on environmental resources. Part 783 of this chapter, Underground Mining Permit...

  19. 30 CFR 910.783 - Underground mining permit applications-minimum requirements for information on environmental...

    Science.gov (United States)

    2010-07-01

    ... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Underground mining permit applications-minimum... MINING OPERATIONS WITHIN EACH STATE GEORGIA § 910.783 Underground mining permit applications—minimum requirements for information on environmental resources. Part 783 of this chapter, Underground Mining Permit...

  20. 77 FR 58170 - Proposed Renewal of Existing Information Collection; Fire Protection (Underground Coal Mines)

    Science.gov (United States)

    2012-09-19

    ... (Underground Coal Mines) AGENCY: Mine Safety and Health Administration, Labor. ACTION: Request for public... (facsimile). SUPPLEMENTARY INFORMATION: I. Background Fire protection standards for underground coal mines....1100 requires that each coal mine be provided with suitable firefighting equipment adapted for the...

  1. Research on the information extracting method for mine ecological environment destruction based on remote sensing images%基于遥感图像矿山生态环境破坏信息边界提取方法研究

    Institute of Scientific and Technical Information of China (English)

    潘洁晨

    2012-01-01

    Multi - source remote sensing data has been widely used in mine environment monitoring, but extracting the mine ecological environment destruction information by the method combined high-resolution remote sensing images with mathematic morphology is seldom applied. This paper applied the methods of mathematical morphology combined with MATLAB application platform, high resolution IKONS data, detected various traditional image interpreting to compare with the way which is used in this paper, field measured data verify that it is very effective to apply the methods of mathematical morphology to special target extraction,which opens up a new way for mine environmental protection, supervision and management.%多源遥感卫星数据在矿山环境监测中得到了广泛的应用,而高分辨率遥感影像数据结合应用数学形态学方法提取矿山活动信息的方法在提取矿山生态环境破坏范围的应用还很少见.所以,本文试图利用数学形态学的方法,结合Matlab应用程序平台,运用IKONS高分辨率遥感影像数据,采用多种传统图像处理算法与本文算法进行比较,并在野外数据的验证下,证实数学形态学方法应用于提取矿山活动信息的可行性,为矿山环境保护、监控和管理开拓了新的思路.

  2. Mining Matters : Natural Resource Extraction and Local Business Constraints

    NARCIS (Netherlands)

    de Haas, Ralph; Poelhekke, Steven

    2016-01-01

    We estimate the impact of local mining activity on the business constraints experienced by 22,150 firms across eight resource-rich countries. We find that with the presence of active mines, the business environment in the immediate vicinity (<20 km) of a firm deteriorates but business constraints of

  3. Mining Matters : Natural Resource Extraction and Local Business Constraints

    NARCIS (Netherlands)

    de Haas, Ralph; Poelhekke, Steven

    2016-01-01

    We estimate the impact of local mining activity on the business constraints experienced by 22,150 firms across eight resource-rich countries. We find that with the presence of active mines, the business environment in the immediate vicinity (<20 km) of a firm deteriorates but business constraints of

  4. MeInfoText: associated gene methylation and cancer information from text mining

    Directory of Open Access Journals (Sweden)

    Juan Hsueh-Fen

    2008-01-01

    Full Text Available Abstract Background DNA methylation is an important epigenetic modification of the genome. Abnormal DNA methylation may result in silencing of tumor suppressor genes and is common in a variety of human cancer cells. As more epigenetics research is published electronically, it is desirable to extract relevant information from biological literature. To facilitate epigenetics research, we have developed a database called MeInfoText to provide gene methylation information from text mining. Description MeInfoText presents comprehensive association information about gene methylation and cancer, the profile of gene methylation among human cancer types and the gene methylation profile of a specific cancer type, based on association mining from large amounts of literature. In addition, MeInfoText offers integrated protein-protein interaction and biological pathway information collected from the Internet. MeInfoText also provides pathway cluster information regarding to a set of genes which may contribute the development of cancer due to aberrant methylation. The extracted evidence with highlighted keywords and the gene names identified from each methylation-related abstract is also retrieved. The database is now available at http://mit.lifescience.ntu.edu.tw/. Conclusion MeInfoText is a unique database that provides comprehensive gene methylation and cancer association information. It will complement existing DNA methylation information and will be useful in epigenetics research and the prevention of cancer.

  5. BSQA: integrated text mining using entity relation semantics extracted from biological literature of insects.

    Science.gov (United States)

    He, Xin; Li, Yanen; Khetani, Radhika; Sanders, Barry; Lu, Yue; Ling, Xu; Zhai, Chengxiang; Schatz, Bruce

    2010-07-01

    Text mining is one promising way of extracting information automatically from the vast biological literature. To maximize its potential, the knowledge encoded in the text should be translated to some semantic representation such as entities and relations, which could be analyzed by machines. But large-scale practical systems for this purpose are rare. We present BeeSpace question/answering (BSQA) system that performs integrated text mining for insect biology, covering diverse aspects from molecular interactions of genes to insect behavior. BSQA recognizes a number of entities and relations in Medline documents about the model insect, Drosophila melanogaster. For any text query, BSQA exploits entity annotation of retrieved documents to identify important concepts in different categories. By utilizing the extracted relations, BSQA is also able to answer many biologically motivated questions, from simple ones such as, which anatomical part is a gene expressed in, to more complex ones involving multiple types of relations. BSQA is freely available at http://www.beespace.uiuc.edu/QuestionAnswer.

  6. Natural radioactivity in commercial granites extracted near old uranium mines: scientific, economic and social impact of disinformation.

    Science.gov (United States)

    Pereira, Dolores; Pereira, Alcides; Neves, Luis

    2015-04-01

    The study of radioactivity in natural stones is a subject of great interest from different points of view: scientific, social and economic. Several previous studies have demonstrated that the radioactivity is dependent, not only on the uranium content, but also on the structures, textures, minerals containing the uranium and degree of weathering of the natural stone. Villavieja granite is extracted in a village where uranium mining was an important activity during the 20th century. Today the mine is closed but the granite is still extracted. Incorrect information about natural radioactivity given to natural stone users, policy makers, construction managers and the general public has caused turmoil in the media for many years. This paper considers problems associated with the communication of reliable information, as well as uncertainties, on natural radioactivity to these audiences.

  7. A management information system for mine railway transportation equipment

    Institute of Scientific and Technical Information of China (English)

    LI Mei-yu; HAN Ke-qi; ZHANG Xiao-yong; LI Xiao-lin; ZHAI Yong-jun; YU Wei-hu

    2008-01-01

    Good equipment management is essential for the day to day management of an enterprise. Targeted at production opera-tion of the railway transportation department of a mining group and aimed at mine railway equipment management, we have estab-lished a management information system for the equipment in the entire process of the life cycle of equipment. The project deals with basic data about equipment, initial management, maintenance, operation and even disposal, based on a C/S and B/S structure.We adopted an object-oriented approach, dealing with software engineering, information engineering, economic and organizational measures. Thus, effective monitoring and control of the operation of railway equipment and its status in the entire process has been achieved.

  8. Improving structural medical process comparison by exploiting domain knowledge and mined information.

    Science.gov (United States)

    Montani, Stefania; Leonardi, Giorgio; Quaglini, Silvana; Cavallini, Anna; Micieli, Giuseppe

    2014-09-01

    Process model comparison and similar process retrieval is a key issue to be addressed in many real-world situations, and a particularly relevant one in medical applications, where similarity quantification can be exploited to accomplish goals such as conformance checking, local process adaptation analysis, and hospital ranking. In this paper, we present a framework that allows the user to: (i) mine the actual process model from a database of process execution traces available at a given hospital; and (ii) compare (mined) process models. The tool is currently being applied in stroke management. Our framework relies on process mining to extract process-related information (i.e., process models) from data. As for process comparison, we have modified a state-of-the-art structural similarity metric by exploiting: (i) domain knowledge; (ii) process mining outputs and statistical temporal information. These changes were meant to make the metric more suited to the medical domain. Experimental results showed that our metric outperforms the original one, and generated output closer than that provided by a stroke management expert. In particular, our metric correctly rated 11 out of 15 mined hospital models with respect to a given query. On the other hand, the original metric correctly rated only 7 out of 15 models. The experiments also showed that the framework can support stroke management experts in answering key research questions: in particular, average patient improvement decreased as the distance (according to our metric) from the top level hospital process model increased. The paper shows that process mining and process comparison, through a similarity metric tailored to medical applications, can be applied successfully to clinical data to gain a better understanding of different medical processes adopted by different hospitals, and of their impact on clinical outcomes. In the future, we plan to make our metric even more general and efficient, by explicitly

  9. DiMeX: A Text Mining System for Mutation-Disease Association Extraction.

    Science.gov (United States)

    Mahmood, A S M Ashique; Wu, Tsung-Jung; Mazumder, Raja; Vijay-Shanker, K

    2016-01-01

    The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.

  10. Extracting laboratory test information from biomedical text

    Directory of Open Access Journals (Sweden)

    Yanna Shen Kang

    2013-01-01

    Full Text Available Background: No previous study reported the efficacy of current natural language processing (NLP methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens was very limited or when lexical morphology of the entity was distinctive (as in units of measures, yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure.

  11. Application of GIS to Geological Information Extraction

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    GIS. a powerful tool for processing spatial data, is advantageous in its spatial overlaying. In this paper, GIS is applied to the extraction of geological information. Information associated with mineral resources is chosen to delineate the geo-anomalies, the basis of ore-forming anomalies and of mineral-deposit location. This application is illustrated with an example in Weixi area, Yunnan Province.

  12. Relative extraction ratio (RER) for arsenic and heavy metals in soils and tailings from various metal mines, Korea.

    Science.gov (United States)

    Son, Hye Ok; Jung, Myung Chae

    2011-01-01

    This study focused on the evaluation of leaching behaviours for arsenic and heavy metals (Cd, Cu, Ni, Pb and Zn) in soils and tailings contaminated by mining activities. Ten representative mine soils were taken at four representative metal mines in Korea. To evaluate the leaching characteristics of the samples, eight extraction methods were adapted namely 0.1 M HCl, 0.5 M HCl, 1.0 M HCl, 3.0 M HCl, Korean Standard Leaching Procedure for waste materials (KSLP), Synthetic Precipitation Leaching Procedure (SPLP), Toxicity Characteristic Leaching Procedure (TCLP) and aqua regia extraction (AR) methods. In order to compare element concentrations as extraction methods, relative extraction ratios (RERs, %), defined as element concentration extracted by the individual leaching method divided by that extracted by aqua regia based on USEPA method 3050B, were calculated. Although the RER values can vary upon sample types and elements, they increase with increasing ionic strength of each extracting solution. Thus, the RER for arsenic and heavy metals in the samples increased in the order of KSLP extraction method, the RER values for Cd and Zn were relatively higher than those for As, Cu, Ni and Pb. This may be due to differences in geochemical behaviour of each element, namely high solubility of Cd and Zn and low solubility of As, Cu, Ni and Pb in surface environment. Thus, the extraction results can give important information on the degree and extent of arsenic and heavy metal dispersion in the surface environment.

  13. Ergonomic, psychosocial factors and risks at work in informal mining

    Directory of Open Access Journals (Sweden)

    Milena Nunes Alves de Sousa

    2015-09-01

    Full Text Available The goal of this study was to identify ergonomic and psychosocial factors, and risks at informal work in the mining sector of the State of Paraíba, Brazil, from miners' perspective. A cross-sectional and descriptive study was conducted with 371 informal mining workers. They responded two questionnaires for assessing work performed in three dimensions: ergonomic factors; psychosocial factors; and occupational risks. The scores of the items of each dimension were added so that, the higher the score, the lower workers' satisfaction related to the area investigated. The results indicated that noise was common in the working environment (66%. Most workers (54.7% pointed out that the work was too hard and that it required attention and reasoning (85.7%. The workers emphasized the lack of training for working in mining (59.3% and few of them regarded the maintenance of the workplace as a component to prevent lumbago (32.3%. Risk of accidents was pointed out as the factor that needed increased attention in daily work (56.6%. All occupational risks were mentioned, including physical and chemical risks. There was significant correlation between age and occupational risks, indicating that the greater the age, the greater the perception of harmful agents (ρ = -0.23; p < 0.01. In the end, it was observed that, to a greater or lesser degree, all workers perceived ergonomic and psychosocial factors, and risks in informal mining. Length of service and age were the features that interfered significantly with the understanding of those factors and occupational risks.

  14. Data Mining: The Art of Automated Knowledge Extraction

    Science.gov (United States)

    Karimabadi, H.; Sipes, T.

    2012-12-01

    Data mining algorithms are used routinely in a wide variety of fields and they are gaining adoption in sciences. The realities of real world data analysis are that (a) data has flaws, and (b) the models and assumptions that we bring to the data are inevitably flawed, and/or biased and misspecified in some way. Data mining can improve data analysis by detecting anomalies in the data, check for consistency of the user model assumptions, and decipher complex patterns and relationships that would not be possible otherwise. The common form of data collected from in situ spacecraft measurements is multi-variate time series which represents one of the most challenging problems in data mining. We have successfully developed algorithms to deal with such data and have extended the algorithms to handle streaming data. In this talk, we illustrate the utility of our algorithms through several examples including automated detection of reconnection exhausts in the solar wind and flux ropes in the magnetotail. We also show examples from successful applications of our technique to analysis of 3D kinetic simulations. With an eye to the future, we provide an overview of our upcoming plans that include collaborative data mining, expert outsourcing data mining, computer vision for image analysis, among others. Finally, we discuss the integration of data mining algorithms with web-based services such as VxOs and other Heliophysics data centers and the resulting capabilities that it would enable.

  15. The development of mine digitization information system and its application to the Lanping Pb-Zn mine

    Institute of Scientific and Technical Information of China (English)

    YAN Yongfeng; QIN Dexian; YU Yangxian; LI Shilei; JIA Fengqin; WANG Xiaoli; LIU Fangcheng

    2008-01-01

    To compile the software package of the mine digitization information system in terms of the principle and methods of mathematics geology and geographic information system using computer languages such as VB and Matlab, this paper introduces the function composition and operating environment of this information system, and illustrates the function and effectiveness of the software package as exemplified by the Lanping Pb-Zn mine, Yunnan Province.

  16. The building and development of China`s mine geographic information system

    Energy Technology Data Exchange (ETDEWEB)

    Wang, J.; Guo, D. [China University of Mining and Technology (China)

    1996-06-01

    The mine geographic information system (MGIS) is developed by applying the technique of GIS on the unique features of mines. The functions of the system include: collecting, storing, processing, interrogating, analysing, forecasting, evaluating and outputting the mine spacial information. It provides the scientific data required for planning and decision making in the management of mine production. This paper describes the development of China`s MGIS, and discusses the current status and future developmental strategy. 2 refs., 2 figs.

  17. The role of conflict minerals, artisanal mining, and informal trading networks in African intrastate and regional conflicts

    Science.gov (United States)

    Chirico, Peter G.; Malpeli, Katherine C.

    2014-01-01

    The relationship between natural resources and armed conflict gained public and political attention in the 1990s, when it became evident that the mining and trading of diamonds were connected with brutal rebellions in several African nations. Easily extracted resources such as alluvial diamonds and gold have been and continue to be exploited by rebel groups to fund their activities. Artisanal and small-scale miners operating under a quasi-legal status often mine these mineral deposits. While many African countries have legalized artisanal mining and established flow chains through which production is intended to travel, informal trading networks frequently emerge in which miners seek to evade taxes and fees by selling to unauthorized buyers. These networks have the potential to become international in scope, with actors operating in multiple countries. The lack of government control over the artisanal mining sector and the prominence of informal trade networks can have severe social, political, and economic consequences. In the past, mineral extraction fuelled violent civil wars in Sierra Leone, Liberia, and Angola, and it continues to do so today in several other countries. The significant influence of the informal network that surrounds artisanal mining is therefore an important security concern that can extend across borders and have far-reaching impacts.

  18. Presentations from the 1992 Coal Mining Impoundment Informational Meeting

    Energy Technology Data Exchange (ETDEWEB)

    1993-12-31

    On May 20 and 21, 1992, the MSHA Coal Mining Impoundment Informational Meeting was held at the National Mine Health and Safety Academy in Beckley, West Virginia. Fifteen presentations were given on key issues involved in the design and construction of dams associated with coal mining. The attendees were told that to improve the consistency among the plan reviewers, engineers from the Denver and Pittsburgh Technical Support Centers meet twice annually to discuss specific technical issues. It was soon discovered that the topics being discussed needed to be shared with anyone involved with coal waste dam design, construction, or inspection. The only way to accomplish that goal was through the issuance of Procedure Instruction Letters. The Letters present a consensus of engineering philosophy that could change over time. They do not present policy or carry the force of law. Currently, thirteen position papers have been disseminated and more will follow as the need arises. The individual paper were not even entered into the database.

  19. Personalized Multimedia Information Retrieval based on User Profile Mining

    Directory of Open Access Journals (Sweden)

    Pengyi Zhang

    2013-10-01

    Full Text Available This paper focuses on how to retrieve personalized multimedia information based on user interest which can be mined from user profile. After analyzing the related works, a general structure of the personalized multimedia information retrieval system is given, which combines online module and offline module. Firstly, we collect a large-sale of photos from multimedia information sharing websites. Then, we record the information of the users who upload the multimedia information. For a given user, we save his history data which could describe the multimedia data. Secondly, the relationship between contents of multimedia data and semantic information is analyzed and then the user interest model is constructed by a modified LDA model which can integrate all the influencing factors in the task of multimedia information retrieval. Thirdly, the query distributions of all the topics can be estimated by the proposed modified LDA model. Thirdly, based on the above offline computing process, the online personalized multimedia information ranking algorithm is given which utilize the user interest model and the query word. Fourthly, multimedia information retrieval results are obtained using the proposed personalized multimedia information ranking algorithm. Finally, performance evaluation is conducted by a series of experiments to test the performance of the proposed algorithm compared with other methods on different datasets.

  20. Extraction of information from a single quantum

    OpenAIRE

    Paraoanu, G. S.

    2011-01-01

    We investigate the possibility of performing quantum tomography on a single qubit with generalized partial measurements and the technique of measurement reversal. Using concepts from statistical decision theory, we prove that, somewhat surprisingly, no information can be obtained using this scheme. It is shown that, irrespective of the measurement technique used, extraction of information from single quanta is at odds with other general principles of quantum physics.

  1. DKIE: Open Source Information Extraction for Danish

    DEFF Research Database (Denmark)

    Derczynski, Leon; Field, Camilla Vilhelmsen; Bøgh, Kenneth Sejdenfaden

    2014-01-01

    Danish is a major Scandinavian language spoken daily by around six million people. However, it lacks a unified, open set of NLP tools. This demonstration will introduce DKIE, an extensible open-source toolkit for processing Danish text. We implement an information extraction architecture for Danish...

  2. DKIE: Open Source Information Extraction for Danish

    DEFF Research Database (Denmark)

    Derczynski, Leon; Field, Camilla Vilhelmsen; Bøgh, Kenneth Sejdenfaden

    2014-01-01

    Danish is a major Scandinavian language spoken daily by around six million people. However, it lacks a unified, open set of NLP tools. This demonstration will introduce DKIE, an extensible open-source toolkit for processing Danish text. We implement an information extraction architecture for Danish...... independently or with the Stanford NLP toolkit....

  3. A Frequent Pattern Mining Algorithm for Feature Extraction of Customer Reviews

    Directory of Open Access Journals (Sweden)

    Seyed Hamid Ghorashi

    2012-07-01

    Full Text Available Online shoppers often have different idea about the same product. They look for the product features that are consistent with their goal. Sometimes a feature might be interesting for one, while it does not make that impression for someone else. Unfortunately, identifying the target product with particular features is a tough task which is not achievable with existing functionality provided by common websites. In this paper, we present a frequent pattern mining algorithm to mine a bunch of reviews and extract product features. Our experimental results indicate that the algorithm outperforms the old pattern mining techniques used by previous researchers.

  4. Rapid metal extractability tests from polluted mining soils by ultrasound probe sonication and microwave-assisted extraction systems.

    Science.gov (United States)

    García-Salgado, Sara; Quijano, M Ángeles

    2016-12-01

    Ultrasonic probe sonication (UPS) and microwave-assisted extraction (MAE) were used for rapid single extraction of Cd, Cr, Cu, Ni, Pb, and Zn from soils polluted by former mining activities (Mónica Mine, Bustarviejo, NW Madrid, Spain), using 0.01 mol L(-1) calcium chloride (CaCl2), 0.43 mol L(-1) acetic acid (CH3COOH), and 0.05 mol L(-1) ethylenediaminetetraacetic acid (EDTA) at pH 7 as extracting agents. The optimum extraction conditions by UPS consisted of an extraction time of 2 min for both CaCl2 and EDTA extractions and 15 min for CH3COOH extraction, at 30% ultrasound (US) amplitude, whereas in the case of MAE, they consisted of 5 min at 50 °C for both CaCl2 and EDTA extractions and 15 min at 120 °C for CH3COOH extraction. Extractable concentrations were determined by inductively coupled plasma atomic emission spectrometry (ICP-AES). The proposed methods were compared with a reduced version of the corresponding single extraction procedures proposed by the Standards, Measurements and Testing Programme (SM&T). The results obtained showed a great variability on extraction percentages, depending on the metal, the total concentration level and the soil sample, reaching high values in some areas. However, the correlation analysis showed that total concentration is the most relevant factor for element extractability in these soil samples. From the results obtained, the application of the accelerated extraction procedures, such as MAE and UPS, could be considered a useful approach to evaluate rapidly the extractability of the metals studied.

  5. A Platform for Supporting Knowledge Mining and Reuse Based on Context Information of a Project

    Directory of Open Access Journals (Sweden)

    I-Chin Wu

    2010-06-01

    Full Text Available Organizations implement Knowledge Management Systems (KMS to maximize the effectiveness and reuse of knowledge assets in order to increase productivity and profitability. Thus, effective project management can place great demands on knowledge management solutions designed to support and streamline the execution of project-related tasks. Accordingly, in this work we extract knowledge from historical projects, design a project-in-context (PIC meta-model, and deploy a platform that facilitates the capture and reuse of project-specific information based on the context. The research areas addressed in the work are as follows. (1 Knowledge acquisition: analyzing the type of project and its associated attributes and defining general, but essential, project context information based on the PIC model. (2 Knowledge discovery: the use of text mining and data mining techniques to extract knowledge items needed by workers, and discover the relationships between various knowledge items. (3 Knowledge utilization based on the context: with the proposed model and methods, several applications related to the reuse of project knowledge by pull- and push-based knowledge management strategies are developed to achieve effective project management. From the perspective of project management, the proposed model and system can help knowledge workers understand information about a current research project and resolve problems effectively. [Article content in Chinese; Extended abstract in English

  6. Extracting a very thick seam at the Stara Jama mine in Yugoslavia

    Energy Technology Data Exchange (ETDEWEB)

    Bijelic, V.

    1986-01-01

    The Stara Jama mine is located in a brown coal field in the Northern part of the central Bosnian coal reserves. The main seam in the West Field attains an average seam thickness of 13.20 mm. On account of its major thickness, good calorific value and relatively low depth the extraction of the seam is of interest in economic terms. This article describes a fully mechanised longwall installation at Stara Jama mine for the complete extraction of thick coal seams. Analysis of the operating results of this method with the extraction method previously used leads to the conclusion that the longwalling applied at Stara Jama mine in conjunction with the top slicing method facilitates exploitation of the deposit, high operating results and a low safety risk.

  7. Extraction of spatio-temporal information of earthquake event based on semantic technology

    Science.gov (United States)

    Fan, Hong; Guo, Dan; Li, Huaiyuan

    2015-12-01

    In this paper a web information extraction method is presented which identifies a variety of thematic events utilizing the event knowledge framework derived from text training, and then further uses the syntactic analysis to extract the event key information. The method which combines the text semantic information and domain knowledge of the event makes the extraction of information people interested more accurate. In this paper, web based earthquake news extraction is taken as an example. The paper firstly briefs the overall approaches, and then details the key algorithm and experiments of seismic events extraction. Finally, this paper conducts accuracy analysis and evaluation experiments which demonstrate that the proposed method is a promising way of hot events mining.

  8. 30 CFR 921.783 - Underground mining permit applications-minimum requirements for information on environmental...

    Science.gov (United States)

    2010-07-01

    ... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Underground mining permit applications-minimum... MINING OPERATIONS WITHIN EACH STATE MASSACHUSETTS § 921.783 Underground mining permit applications—minimum requirements for information on environmental resources. Part 783 of this chapter, Underground...

  9. 30 CFR 933.783 - Underground mining permit applications-minimum requirements for information on environmental...

    Science.gov (United States)

    2010-07-01

    ... 30 Mineral Resources 3 2010-07-01 2010-07-01 false Underground mining permit applications-minimum... MINING OPERATIONS WITHIN EACH STATE NORTH CAROLINA § 933.783 Underground mining permit applications—minimum requirements for information on environmental resources. Part 783 of this chapter, Underground...

  10. 77 FR 64360 - Proposed Extension of Existing Information Collection; Mine Rescue Teams for Underground Metal...

    Science.gov (United States)

    2012-10-19

    ... Safety and Health Administration Proposed Extension of Existing Information Collection; Mine Rescue Teams...) to publish regulations which provide that mine rescue teams be available for rescue and recovery work... arrangements for such teams are to be borne by the operator of each such mine. II. Desired Focus of...

  11. An Approach to Mine Textual Information From Pubmed Database

    Directory of Open Access Journals (Sweden)

    G Charles Babu

    2012-05-01

    Full Text Available The web has greatly improved access to scientific literature. A wide spectrum of research data has been created and collected by researchers. However, textual information on the web are largely disorganized, with research articles being spread across archive sites, institution sites, journal sites and researcher homepages. Data was widely available over internet and many kinds of data pose the current challenge in storage and retrieval. Datasets can be made more accessible and user-friendly through annotation, aggregation, cross-linking to other datasets. Biomedical datasets are growing exponentially and new curative information appears regularly in research publications such as MedLine, PubMed, Science Direct etc. Therefore, a context based text mining was developed using python language to search huge database such as PubMed based on a given keyword which retrieves data between specified years.

  12. Web Information Extraction%Web信息抽取

    Institute of Scientific and Technical Information of China (English)

    李晶; 陈恩红

    2003-01-01

    With the tremendous amount of information available on the Web, the ability to quickly obtain information has become a crucial problem. It is not enough for us to acquire information only with Web information retrieval technology. Therefore more and more people pay attention to Web information extraction technology. This paper first in- troduces some concepts of information extraction technology, then introduces and analyzes several typical Web information extraction methods based on the differences in extraction patterns.

  13. Automated information extraction from web APIs documentation

    OpenAIRE

    Ly, Papa Alioune; Pedrinaci, Carlos; Domingue, John

    2012-01-01

    A fundamental characteristic of Web APIs is the fact that, de facto, providers hardly follow any standard practices while implementing, publishing, and documenting their APIs. As a consequence, the discovery and use of these services by third parties is significantly hampered. In order to achieve further automation while exploiting Web APIs we present an approach for automatically extracting relevant technical information from the Web pages documenting them. In particular we have devised two ...

  14. Unsupervised information extraction by text segmentation

    CERN Document Server

    Cortez, Eli

    2013-01-01

    A new unsupervised approach to the problem of Information Extraction by Text Segmentation (IETS) is proposed, implemented and evaluated herein. The authors' approach relies on information available on pre-existing data to learn how to associate segments in the input string with attributes of a given domain relying on a very effective set of content-based features. The effectiveness of the content-based features is also exploited to directly learn from test data structure-based features, with no previous human-driven training, a feature unique to the presented approach. Based on the approach, a

  15. Spatial information mining and visualization for Qinghai-Tibet Plateau's literature based on GIS

    Science.gov (United States)

    Wang, Xuemei; Ma, Mingguo

    2009-10-01

    The subject intersection becomes a hot research topic recently. This paper tried to couple the Bibliometrics and Geographical Information System (GIS) technologies for studying on the spatial information mining and visualization from the Qinghai-Tibet Plateau's literature. All the literatures about Qinghai-Tibet Plateau research were indexed in the ISI Web of Knowledge. The statistical tables about the authors were extracted from the papers by using the method of bibliometrics. The spatial information of the author's countries was linked with the GIS database. The spatial distribution was presented by the format of maps based on the GIS technologies. Comparing with the regular presentation forms of the bibliometrical analysis, the spatial distribution maps can afford more abundant and intuitive senses for the users.

  16. Mining residential water and electricity demand data in Southern California to inform demand management strategies

    Science.gov (United States)

    Cominola, A.; Spang, E. S.; Giuliani, M.; Castelletti, A.; Loge, F. J.; Lund, J. R.

    2016-12-01

    Demand side management strategies are key to meet future water and energy demands in urban contexts, promote water and energy efficiency in the residential sector, provide customized services and communications to consumers, and reduce utilities' costs. Smart metering technologies allow gathering high temporal and spatial resolution water and energy consumption data and support the development of data-driven models of consumers' behavior. Modelling and predicting resource consumption behavior is essential to inform demand management. Yet, analyzing big, smart metered, databases requires proper data mining and modelling techniques, in order to extract useful information supporting decision makers to spot end uses towards which water and energy efficiency or conservation efforts should be prioritized. In this study, we consider the following research questions: (i) how is it possible to extract representative consumers' personalities out of big smart metered water and energy data? (ii) are residential water and energy consumption profiles interconnected? (iii) Can we design customized water and energy demand management strategies based on the knowledge of water- energy demand profiles and other user-specific psychographic information? To address the above research questions, we contribute a data-driven approach to identify and model routines in water and energy consumers' behavior. We propose a novel customer segmentation procedure based on data-mining techniques. Our procedure consists of three steps: (i) extraction of typical water-energy consumption profiles for each household, (ii) profiles clustering based on their similarity, and (iii) evaluation of the influence of candidate explanatory variables on the identified clusters. The approach is tested onto a dataset of smart metered water and energy consumption data from over 1000 households in South California. Our methodology allows identifying heterogeneous groups of consumers from the studied sample, as well as

  17. Coal mining with Triple-section extraction process in stagger arrangement roadway layout method

    Science.gov (United States)

    Cui, Zimo; Liu, Baozhu; Zhao, Jingli; Chanda, Emmanuel

    2017-03-01

    This paper introduces the Triple-section extraction process in the three-dimensional roadway layout of stagger arrangement method for longwall top-coal caving mining. This 3-D roadway layout of stagger arrangement method without coal pillars, which arranged the air intake roadway and air return roadway in different horizons, realizing the design theory transformation of roadway layout from 2D system to 3D system. And the paper makes systematic analysis to the geological, technical and economic factors, applies this new mining roadway layout technology for raising coal recovery ratio and solving the problems about full-seam mining in thick coal seam synthetically according to theoretical study and mining practice. Furthermore, the paper presents a physical simulation about inner staggered roadway layout of this particular longwall top-coal caving method.

  18. Extracting the information backbone in online system.

    Science.gov (United States)

    Zhang, Qian-Ming; Zeng, An; Shang, Ming-Sheng

    2013-01-01

    Information overload is a serious problem in modern society and many solutions such as recommender system have been proposed to filter out irrelevant information. In the literature, researchers have been mainly dedicated to improving the recommendation performance (accuracy and diversity) of the algorithms while they have overlooked the influence of topology of the online user-object bipartite networks. In this paper, we find that some information provided by the bipartite networks is not only redundant but also misleading. With such "less can be more" feature, we design some algorithms to improve the recommendation performance by eliminating some links from the original networks. Moreover, we propose a hybrid method combining the time-aware and topology-aware link removal algorithms to extract the backbone which contains the essential information for the recommender systems. From the practical point of view, our method can improve the performance and reduce the computational time of the recommendation system, thus improving both of their effectiveness and efficiency.

  19. Extracting the information backbone in online system

    CERN Document Server

    Zhang, Qian-Ming; Shang, Ming-Sheng

    2013-01-01

    Information overload is a serious problem in modern society and many solutions such as recommender system have been proposed to filter out irrelevant information. In the literature, researchers mainly dedicated to improve the recommendation performance (accuracy and diversity) of the algorithms while overlooked the influence of topology of the online user-object bipartite networks. In this paper, we find that some information provided by the bipartite networks is not only redundant but also misleading. With such "less can be more" feature, we design some algorithms to improve the recommendation performance by eliminating some links from the original networks. Moreover, we propose a hybrid method combining the time-aware and topology-aware link removal algorithms to extract the backbone which contains the essential information for the recommender systems. From the practical point of view, our method can improve the performance and reduce the computational time of the recommendation system, thus improve both of...

  20. Knowledge discovery: Extracting usable information from large amounts of data

    Energy Technology Data Exchange (ETDEWEB)

    Whiteson, R.

    1998-12-31

    The threat of nuclear weapons proliferation is a problem of world wide concern. Safeguards are the key to nuclear nonproliferation and data is the key to safeguards. The safeguards community has access to a huge and steadily growing volume of data. The advantages of this data rich environment are obvious, there is a great deal of information which can be utilized. The challenge is to effectively apply proven and developing technologies to find and extract usable information from that data. That information must then be assessed and evaluated to produce the knowledge needed for crucial decision making. Efficient and effective analysis of safeguards data will depend on utilizing technologies to interpret the large, heterogeneous data sets that are available from diverse sources. With an order-of-magnitude increase in the amount of data from a wide variety of technical, textual, and historical sources there is a vital need to apply advanced computer technologies to support all-source analysis. There are techniques of data warehousing, data mining, and data analysis that can provide analysts with tools that will expedite their extracting useable information from the huge amounts of data to which they have access. Computerized tools can aid analysts by integrating heterogeneous data, evaluating diverse data streams, automating retrieval of database information, prioritizing inputs, reconciling conflicting data, doing preliminary interpretations, discovering patterns or trends in data, and automating some of the simpler prescreening tasks that are time consuming and tedious. Thus knowledge discovery technologies can provide a foundation of support for the analyst. Rather than spending time sifting through often irrelevant information, analysts could use their specialized skills in a focused, productive fashion. This would allow them to make their analytical judgments with more confidence and spend more of their time doing what they do best.

  1. Knowledge discovery: Extracting usable information from large amounts of data

    Energy Technology Data Exchange (ETDEWEB)

    Whiteson, R.

    1998-12-31

    The threat of nuclear weapons proliferation is a problem of world wide concern. Safeguards are the key to nuclear nonproliferation and data is the key to safeguards. The safeguards community has access to a huge and steadily growing volume of data. The advantages of this data rich environment are obvious, there is a great deal of information which can be utilized. The challenge is to effectively apply proven and developing technologies to find and extract usable information from that data. That information must then be assessed and evaluated to produce the knowledge needed for crucial decision making. Efficient and effective analysis of safeguards data will depend on utilizing technologies to interpret the large, heterogeneous data sets that are available from diverse sources. With an order-of-magnitude increase in the amount of data from a wide variety of technical, textual, and historical sources there is a vital need to apply advanced computer technologies to support all-source analysis. There are techniques of data warehousing, data mining, and data analysis that can provide analysts with tools that will expedite their extracting useable information from the huge amounts of data to which they have access. Computerized tools can aid analysts by integrating heterogeneous data, evaluating diverse data streams, automating retrieval of database information, prioritizing inputs, reconciling conflicting data, doing preliminary interpretations, discovering patterns or trends in data, and automating some of the simpler prescreening tasks that are time consuming and tedious. Thus knowledge discovery technologies can provide a foundation of support for the analyst. Rather than spending time sifting through often irrelevant information, analysts could use their specialized skills in a focused, productive fashion. This would allow them to make their analytical judgments with more confidence and spend more of their time doing what they do best.

  2. A Process Mining Based Service Composition Approach for Mobile Information Systems

    Directory of Open Access Journals (Sweden)

    Chengxi Huang

    2017-01-01

    Full Text Available Due to the growing trend in applying big data and cloud computing technologies in information systems, it is becoming an important issue to handle the connection between large scale of data and the associated business processes in the Internet of Everything (IoE environment. Service composition as a widely used phase in system development has some limits when the complexity of relationship among data increases. Considering the expanding scale and the variety of devices in mobile information systems, a process mining based service composition approach is proposed in this paper in order to improve the adaptiveness and efficiency of compositions. Firstly, a preprocessing is conducted to extract existing service execution information from server-side logs. Then process mining algorithms are applied to discover the overall event sequence with preprocessed data. After that, a scene-based service composition is applied to aggregate scene information and relocate services of the system. Finally, a case study that applied the work in mobile medical application proves that the approach is practical and valuable in improving service composition adaptiveness and efficiency.

  3. Estimating fugitive methane emissions from oil sands mining using extractive core samples

    Science.gov (United States)

    Johnson, Matthew R.; Crosland, Brian M.; McEwen, James D.; Hager, Darcy B.; Armitage, Joshua R.; Karimi-Golpayegani, Mojgan; Picard, David J.

    2016-11-01

    Fugitive methane emissions from oil sands mining activities are a potentially important source of greenhouse gas emissions for which there are significant uncertainties and a lack of open data. This paper investigates the potential of a control-system approach to estimating fugitive methane emissions by analyzing releasable gas volumes in core samples extracted from undeveloped mine regions. Field experiments were performed by leveraging routine winter drilling activities that are a component of normal mine planning and development, and working in conjunction with an on-site drill crew using existing equipment. Core samples were extracted from two test holes, sealed at the surface, and transported for off-site lab analysis. Despite the challenges of the on-site sample collection and the limitations of the available drilling technology, notable quantities of residual methane (mean of 23.8 mgCH4/kg-core-sample (+41%/-35%) or 779 mgCH4/kg-bitumen (+69%/-34%) at 95% confidence) were measured in the collected core samples. If these factors are applied to the volumes of bitumen mined in Alberta in 2015, they imply fugitive methane emissions equivalent to 2.1 MtCO2e (as correlated with bitumen content) or 1.4 MtCO2e (as correlated with total mined material) evaluated on a 100-year time horizon. An additional ∼0.2 Mt of fugitive CO2 emissions could also be expected. Although additional measurements at a larger number of locations are warranted to determine whether these emissions should be considered as additive to, or inclusive of, current estimates based on flux chamber measurements at the mine face, these first-of-their-kind results demonstrate an intriguing alternate method for quantifying fugitive emissions from oil sands mining and extraction.

  4. Suspended electrodialytic extraction of toxic elements for detoxification of three different mine tailings

    DEFF Research Database (Denmark)

    Jensen, Pernille Erland; Ottosen, Lisbeth M.; Hansen, H.K.

    2016-01-01

    Environmental effects of mining activities partly origin from the production of tailings, and the exposure of these to ambient physical and chemical conditions. Removal of toxic elements from tailings prior to deposition could improve environmental performance and reduce risks. Experimental results...... investigated including pre-treatment of the tailings with acid; insertion of bipolar electrodes; and implementation of pulsed or sinusoidal electric fields. In line with these efforts, we investigated the efficiency when extracting toxic elements from a suspension of tailings, rather than from a solid matrix......, which could well be implemented as a final treatment step prior to deposition of tailings. Six electrodialytic experiments in laboratory scale with three different mine tailings (Codelco, Zinkgruvan, and Nalunaq) show that it is possible to extract residual Cu from the all the three suspended mine...

  5. Extracting a very thick seam at the Stara Jama mine in Yugoslavia

    Energy Technology Data Exchange (ETDEWEB)

    Bijelic, V.

    1986-08-01

    With the general call for a high degree of exploitation of coal deposits, the complete extraction of thick seams is increasingly growing in importance. The following article shows, that the solution of this problem nowadays with fully mechanised longwall installations, as are available at the Stara Jama mine in Zenica, Yugoslavia, can be regarded as an example for the complete extraction of thick coal seams. Since the installation was first put into operation in 1978, the production results have been considerably improved upon.

  6. Cross Lingual Information Retrieval With SMT And Query Mining

    Directory of Open Access Journals (Sweden)

    Suneet Kumar Gupta

    2011-10-01

    Full Text Available In this paper, we have taken the English Corpus and Queries, both translated and transliterated form. We use Statistical Machine Translator to find the result under translated and transliterated queries and then analyzed the result. These queries wise results can then be undergone mining and therefore a new list of queries is created. We have design an experimental setup followed by various steps which calculate Mean Average Precision. We have taken assistance ship of Terrier Open Source for the Information Retrieval. On the basis of created new query list, we calculate the Mean Average Precision and find a significant result i.e. 93.24% which is very close to monolingual results calculated for English language.

  7. Research of Web Data Mining Based on XML%基于XML的Web数据挖掘的研究

    Institute of Scientific and Technical Information of China (English)

    刘振岩; 王万森

    2003-01-01

    The paper advances a system framework of Web data mining based on XML. This system framework inte-grates Information Retrieval with Information Extraction, and utilizes traditional data mining methods to completeWeb data mining through XML.

  8. Digital image processing for information extraction.

    Science.gov (United States)

    Billingsley, F. C.

    1973-01-01

    The modern digital computer has made practical image processing techniques for handling nonlinear operations in both the geometrical and the intensity domains, various types of nonuniform noise cleanup, and the numerical analysis of pictures. An initial requirement is that a number of anomalies caused by the camera (e.g., geometric distortion, MTF roll-off, vignetting, and nonuniform intensity response) must be taken into account or removed to avoid their interference with the information extraction process. Examples illustrating these operations are discussed along with computer techniques used to emphasize details, perform analyses, classify materials by multivariate analysis, detect temporal differences, and aid in human interpretation of photos.

  9. Web Structure Mining: Exploring Hyperlinks and Algorithms for Information Retrieval

    Directory of Open Access Journals (Sweden)

    P. R. Kumar

    2010-01-01

    Full Text Available Problem statement: A study on hyperlink analysis and the algorithms used for link analysis in the Web Information retrieval was done. Approach: This research was initiated because of the dependability of search engines for information retrieval in the web. Understand the web structure mining and determine the importance of hyperlink in web information retrieval particularly using the Google Search engine. Hyperlink analysis was important methodology used by famous search engine Google to rank the pages. Results: The different algorithms used for link analysis like PageRank (PR, Weighted PageRank (WPR and Hyperlink-Induced Topic Search (HITS algorithms are discussed and compared. PageRank algorithm was implemented using a Java program and the convergence of the PageRank values are shown in a chart form. Conclusion: This study was done basically to explore the link structure algorithms for ranking and compare those algorithms. The further research on this area will be problems facing PageRank algorithm and how to handle those problems.

  10. Effect of coal mine dust and clay extracts on the biological activity of the quartz surface.

    Science.gov (United States)

    Stone, V; Jones, R; Rollo, K; Duffin, R; Donaldson, K; Brown, D M

    2004-04-01

    Modification of the quartz surface by aluminium salts and metallic iron have been shown to reduce the biological activity of quartz. This study aimed to investigate the ability of water soluble extracts of coal mine dust (CMD), low aluminium clays (hectorite and montmorillonite) and high aluminium clays (attapulgite and kaolin) to inhibit the reactivity of the quartz surface. DQ12 induced significant haemolysis of sheep erythrocytes in vitro and inflammation in vivo as indicated by increases in the total cell numbers, neutrophil cell numbers, MIP-2 protein and albumin content of bronchoalveolar lavage (BAL) fluid. Treatment of DQ12 with CMD extract prevented both haemolysis and inflammation. Extracts of the high aluminium clays (kaolin and attapulgite) prevented inhibition of DQ12 induced haemolysis, and the kaolin extract inhibited quartz driven inflammation. DQ12 induced haemolysis by coal mine dust and kaolin extract could be prevented by pre-treatment of the extracts with a cation chellator. Extracts of the low aluminium clays (montmorillonite and hectorite) did not prevent DQ12 induced haemolysis, although the hectorite extract did prevent inflammation. These results suggest that CMD, and clays both low and rich in aluminium, all contain soluble components (possibly cations) capable of masking the reactivity of the quartz surface.

  11. Data mining in Cloud Computing

    Directory of Open Access Journals (Sweden)

    Ruxandra-Ştefania PETRE

    2012-10-01

    Full Text Available This paper describes how data mining is used in cloud computing. Data Mining is used for extracting potentially useful information from raw data. The integration of data mining techniques into normal day-to-day activities has become common place. Every day people are confronted with targeted advertising, and data mining techniques help businesses to become more efficient by reducing costs.Data mining techniques and applications are very much needed in the cloud computing paradigm. The implementation of data mining techniques through Cloud computing will allow the users to retrieve meaningful information from virtually integrated data warehouse that reduces the costs of infrastructure and storage.

  12. Extraction of information from unstructured text

    Energy Technology Data Exchange (ETDEWEB)

    Irwin, N.H.; DeLand, S.M.; Crowder, S.V.

    1995-11-01

    Extracting information from unstructured text has become an emphasis in recent years due to the large amount of text now electronically available. This status report describes the findings and work done by the end of the first year of a two-year LDRD. Requirements of the approach included that it model the information in a domain independent way. This means that it would differ from current systems by not relying on previously built domain knowledge and that it would do more than keyword identification. Three areas that are discussed and expected to contribute to a solution include (1) identifying key entities through document level profiling and preprocessing, (2) identifying relationships between entities through sentence level syntax, and (3) combining the first two with semantic knowledge about the terms.

  13. Neural Network Based Algorithm and Simulation of Information Fusion in the Coal Mine

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    The concepts of information fusion and the basic principles of neural networks are introduced.Neural networks were introduced as a way of building an information fusion model in a coal mine monitoring system.This assures the accurate transmission of the multi-sensor information that comes from the coal mine monitoring systems.The information fusion mode was analyzed.An algorithm was designed based on this analysis and some simulation results were given.Finally, conclusions that could provide auxiliary decision making information to the coal mine dispatching officers were presented.

  14. Extracting the information backbone in online system.

    Directory of Open Access Journals (Sweden)

    Qian-Ming Zhang

    Full Text Available Information overload is a serious problem in modern society and many solutions such as recommender system have been proposed to filter out irrelevant information. In the literature, researchers have been mainly dedicated to improving the recommendation performance (accuracy and diversity of the algorithms while they have overlooked the influence of topology of the online user-object bipartite networks. In this paper, we find that some information provided by the bipartite networks is not only redundant but also misleading. With such "less can be more" feature, we design some algorithms to improve the recommendation performance by eliminating some links from the original networks. Moreover, we propose a hybrid method combining the time-aware and topology-aware link removal algorithms to extract the backbone which contains the essential information for the recommender systems. From the practical point of view, our method can improve the performance and reduce the computational time of the recommendation system, thus improving both of their effectiveness and efficiency.

  15. Extracting the Information Backbone in Online System

    Science.gov (United States)

    Zhang, Qian-Ming; Zeng, An; Shang, Ming-Sheng

    2013-01-01

    Information overload is a serious problem in modern society and many solutions such as recommender system have been proposed to filter out irrelevant information. In the literature, researchers have been mainly dedicated to improving the recommendation performance (accuracy and diversity) of the algorithms while they have overlooked the influence of topology of the online user-object bipartite networks. In this paper, we find that some information provided by the bipartite networks is not only redundant but also misleading. With such “less can be more” feature, we design some algorithms to improve the recommendation performance by eliminating some links from the original networks. Moreover, we propose a hybrid method combining the time-aware and topology-aware link removal algorithms to extract the backbone which contains the essential information for the recommender systems. From the practical point of view, our method can improve the performance and reduce the computational time of the recommendation system, thus improving both of their effectiveness and efficiency. PMID:23690946

  16. Intelligent Information Retrieval and Web Mining Architecture Using SOA

    Science.gov (United States)

    El-Bathy, Naser Ibrahim

    2010-01-01

    The study of this dissertation provides a solution to a very specific problem instance in the area of data mining, data warehousing, and service-oriented architecture in publishing and newspaper industries. The research question focuses on the integration of data mining and data warehousing. The research problem focuses on the development of…

  17. Using Open Web APIs in Teaching Web Mining

    Science.gov (United States)

    Chen, Hsinchun; Li, Xin; Chau, M.; Ho, Yi-Jen; Tseng, Chunju

    2009-01-01

    With the advent of the World Wide Web, many business applications that utilize data mining and text mining techniques to extract useful business information on the Web have evolved from Web searching to Web mining. It is important for students to acquire knowledge and hands-on experience in Web mining during their education in information systems…

  18. Using Open Web APIs in Teaching Web Mining

    Science.gov (United States)

    Chen, Hsinchun; Li, Xin; Chau, M.; Ho, Yi-Jen; Tseng, Chunju

    2009-01-01

    With the advent of the World Wide Web, many business applications that utilize data mining and text mining techniques to extract useful business information on the Web have evolved from Web searching to Web mining. It is important for students to acquire knowledge and hands-on experience in Web mining during their education in information systems…

  19. Numerical Investigation of the Dynamic Mechanical State of a Coal Pillar During Longwall Mining Panel Extraction

    Science.gov (United States)

    Wang, Hongwei; Jiang, Yaodong; Zhao, Yixin; Zhu, Jie; Liu, Shuai

    2013-09-01

    This study presents a numerical investigation on the dynamic mechanical state of a coal pillar and the assessment of the coal bump risk during extraction using the longwall mining method. The present research indicates that there is an intact core, even when the peak pillar strength has been exceeded under uniaxial compression. This central portion of the coal pillar plays a significant role in its loading capacity. In this study, the intact core of the coal pillar is defined as an elastic core. Based on the geological conditions of a typical longwall panel from the Tangshan coal mine in the City of Tangshan, China, a numerical fast Lagrangian analysis of continua in three dimensions (FLAC3D) model was created to understand the relationship between the volume of the elastic core in a coal pillar and the vertical stress, which is considered to be an important precursor to the development of a coal bump. The numerical results suggest that, the wider the coal pillar, the greater the volume of the elastic core. Therefore, a coal pillar with large width may form a large elastic core as the panel is mined, and the vertical stress is expected to be greater in magnitude. Because of the high stresses and the associated stored elastic energy, the risk of coal bumps in a coal pillar with large width is greater than for a coal pillar with small width. The results of the model also predict that the peak abutment stress occurs near the intersection between the mining face and the roadways at a distance of 7.5 m from the mining face. It is revealed that the bump-prone zones around the longwall panel are within 7-10 m ahead of the mining face and near the edge of the roadway during panel extraction.

  20. Data Mining and Information Technology: Its Impact on Intelligence Collection and Privacy Rights

    Science.gov (United States)

    2007-11-26

    Modern Information Technology (IT) has radically magnified the capability and power of data mining . At a time when the threat environment has shifted...in emphasis to COIN, terrorism, and cyber war, IT-enhanced data mining capabilities could provide some of the critical intelligence demanded by these...threatened. This paper establishes the intersection between the capability and need for data mining and the suitability of existing policy to enable its

  1. Zonal extraction technology and numerical simulation analysis in open pit coal mine

    Institute of Scientific and Technical Information of China (English)

    Chen Yanlong; Cai Qingxiang; Shang Tao; Peng Hongge; Zhou Wei; Chen Shuzhao

    2012-01-01

    In order to enhance coal recovery ratio of open pit coal mines,a new extraction method called zonal mining system for residual coal around the end-walls is presented.The mining system can improve economic benefits by exploiting haulage and ventilation roadways from the exposed position of coal seams by utilizing the existing transportation systems.Moreover,the main mining parameters have also been discussed.The outcome shows that the load on coal seam roof is about 0.307 MPa and the drop step of the coal seam roof about 20.3 m when the thickness of cover and average volume weight are about 120 m and 0.023 MN/m3 respectively.With the increase of mining height and width,the coal recovery ratio can be improved.However,when recovery ratio is more than 0.85,the average stress on the coal pillar will increase tempestuously,so the recovery ratio should also be controlled to make the coal seam roof safe.Based on the numerical simulation results,it is concluded that the ratio of coal pillar width to height should be more than 1.0 to make sure the coal pillars are steady,and there are only minor displacements on the end-walls.

  2. Extraction procedure testing of solid wastes generated at selected metal ore mines and mills

    Science.gov (United States)

    Harty, David M.; Terlecky, P. Michael

    1986-09-01

    Solid waste samples from a reconnaissance study conducted at ore mining and milling sites were subjected to the U.S. Environmental Protection Agency extraction procedure (EP) leaching test Sites visited included mines and mills extracting ores of antimony (Sb), mercury (Hg), vanadium (V), tungsten (W), and nickel (Ni). Samples analyzed included mine wastes, treatment pond solids, tailings, low grade ore, and other solid wastes generated at these facilities Analysis of the leachate from these tests indicates that none of the samples generated leachate in which the concentration of any toxic metal parameter exceeded EPA criteria levels for those metals. By volume, tailings generally constitute the largest amount of solid wastes generated, but these data indicate that with proper management and monitoring, current EPA criteria can be met for tailings and for most solid wastes associated with mining and milling of these metal ores. Long-term studies are needed to determine if leachate characteristics change with time and to assist in development of closure plans and post closure monitoring programs.

  3. Extraction of quantifiable information from complex systems

    CERN Document Server

    Dahmen, Wolfgang; Griebel, Michael; Hackbusch, Wolfgang; Ritter, Klaus; Schneider, Reinhold; Schwab, Christoph; Yserentant, Harry

    2014-01-01

    In April 2007, the  Deutsche Forschungsgemeinschaft (DFG) approved the  Priority Program 1324 “Mathematical Methods for Extracting Quantifiable Information from Complex Systems.” This volume presents a comprehensive overview of the most important results obtained over the course of the program.   Mathematical models of complex systems provide the foundation for further technological developments in science, engineering and computational finance.  Motivated by the trend toward steadily increasing computer power, ever more realistic models have been developed in recent years. These models have also become increasingly complex, and their numerical treatment poses serious challenges.   Recent developments in mathematics suggest that, in the long run, much more powerful numerical solution strategies could be derived if the interconnections between the different fields of research were systematically exploited at a conceptual level. Accordingly, a deeper understanding of the mathematical foundations as w...

  4. Accumulation of some metals by legumes and their extractability from acid mine spoils. [USA - Alabama

    Energy Technology Data Exchange (ETDEWEB)

    Taylor, R.W.; Ibeabuchi, I.O.; Sistani, K.R.; Shuford, J.W. (Alabama A M University, Normal, AL (USA). Dept. of Plant and Soil Science)

    A greenhouse study was conducted to investigate the growth (dry matter yield) of selected legume cover crops; phytoaccumulation of metals such as Zn, Mn, Pb, Cu, Ni, and Al; the extractability of heavy metals from three different Alabama acid mine spoils. The spoils were amended based on soil test recommended levels of N, P, K, Ca and Mg prior to plant growth. Metals were extracted by three extractants (Mehlich 1, DTPA, and 0.1 M HCl) and values correlated with their accumulation by the selected legumes. Among the cover crops, kobe lespedeza {ital Lespedeza striata} (Thung.) Hook and Arn, sericea lespedeza {ital Lespedeza cuneata} (Dum.) G. Don, and red clover (Trifolium pratense L.) did not survive the stressful conditions of the spoils. However, cowpea (Vigna unguiculata L.) followed by Bragg' soybean {ital Glycine max} (L.) Merr. generally produced the highest dry matter yield while accumulating the largest quantity of metals, except Al, from spoils. The extractability of most metals from the spoils was generally in the order of: 0.1 MHCl {gt} DTPA. Mehlich 1 did not extract Pb and 0.1 M HCl did not extract Ni, whereas DTPA extracted all the metals in a small amount relative to HCl and Mehlich 1. All the extractants were quite effective in removing plant-available Zn from the spoils. In general, the extractants' ability to predict plant-available metals depended on the crop species, spoil type, and extractant used. 28 refs., 4 tabs.

  5. US uranium mining industry: background information on economics and emissions

    Energy Technology Data Exchange (ETDEWEB)

    Bruno, G.A.; Dirks, J.A.; Jackson, P.O.; Young, J.K.

    1984-03-01

    A review of the US uranium mining industry has revealed a generally depressed industry situation. The 1982 U/sub 3/O/sub 8/ production from both open-pit and underground mines declined to 3800 and 6300 tons respectively with the underground portion representing 46% of total production. US exploration and development has continued downward in 1982. Employment in the mining and milling sectors has dropped 31% and 17% respectively in 1982. Representative forecasts were developed for reactor fuel demand and U/sub 3/O/sub 8/ production for the years 1983 and 1990. Reactor fuel demand is estimated to increase from 15,900 tons to 21,300 tons U/sub 3/O/sub 8/ respectively. U/sub 3/O/sub 8/ production, however, is estimated to decrease from 10,600 tons to 9600 tons respectively. A field examination was conducted of 29 selected underground uranium mines that represent 84% of the 1982 underground production. Data was gathered regarding population, land ownership and private property valuation. An analysis of the increased cost to production resulting from the installation of 20-meter high exhaust borehole vent stacks was conducted. An assessment was made of the current and future /sup 222/Rn emission levels for a group of 27 uranium mines. It is shown that /sup 222/Rn emission rates are increasing from 10 individual operating mines through 1990 by 1.2 to 3.8 times. But for the group of 27 mines as a whole, a reduction of total /sup 222/Rn emissions is predicted due to 17 of the mines being shutdown and sealed. The estimated total /sup 222/Rn emission rate for this group of mines will be 105 Ci/yr by year end 1983 or 70% of the 1978-79 measured rate and 124 Ci/yr by year end 1990 or 83% of the 1978-79 measured rate.

  6. 75 FR 3753 - Agency Information Collection Activities: Comment Request for the USGS Mine, Development, and...

    Science.gov (United States)

    2010-01-22

    ... U.S. Geological Survey Agency Information Collection Activities: Comment Request for the USGS Mine, Development, and Mineral Exploration Supplement AGENCY: U.S. Geological Survey (USGS), Interior. ACTION... paperwork requirements for the USGS Mine, Development, and Mineral Exploration Supplement. This collection...

  7. A New Framework for Textual Information Mining over Parse Trees. CRESST Report 805

    Science.gov (United States)

    Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.

    2011-01-01

    Textual information mining is a challenging problem that has resulted in the creation of many different rule-based linguistic query languages. However, these languages generally are not optimized for the purpose of text mining. In other words, they usually consider queries as individuals and only return raw results for each query. Moreover they…

  8. SEMANTIC INFORMATION EXTRACTION IN UNIVERSITY DOMAIN

    Directory of Open Access Journals (Sweden)

    Swathi

    2012-07-01

    Full Text Available Today’s conventional search engines hardly do provide the essential content relevant to the user’s search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.

  9. EXTRACTING HUMAN BEHAVIORAL PATTERNS BY MINING GEO-SOCIAL NETWORKS

    Directory of Open Access Journals (Sweden)

    M. Forghani

    2014-10-01

    Full Text Available Accessibility of positioning technologies such as GPS offer the opportunity to store one’s travel experience and publish it on the web. Using this feature in web-based social networks and considering location information shared by users as a bridge connecting the users’ network to location information layer leads to the formation of Geo-Social Networks. The availability of large amounts of geographical and social data on these networks provides rich sources of information that can be utilized for studying human behavior through data analysis in a spatial-temporal-social context. This paper attempts to investigate the behavior of around 1150 users of Foursquare network by making use of their check-ins. The authors analyzed the metadata associated with the whereabouts of the users, with an emphasis on the type of places, to uncover patterns across different temporal and geographical scales for venue category usage. The authors found five groups of meaningful patterns that can explore region characteristics and recognize a number of major crowd behaviors that recur over time and space.

  10. Impact of historical mining assessed in soils by kinetic extraction and lead isotopic ratios

    Energy Technology Data Exchange (ETDEWEB)

    Camizuli, E., E-mail: estelle.camizuli@u-bourgogne.fr [UMR 6298, ArTeHiS, Université de Bourgogne — CNRS — Culture, 6 bd Gabriel, Bat. Gabriel, 21000 Dijon (France); Monna, F. [UMR 6298, ArTeHiS, Université de Bourgogne — CNRS — Culture, 6 bd Gabriel, Bat. Gabriel, 21000 Dijon (France); Bermond, A.; Manouchehri, N.; Besançon, S. [Institut des sciences et industries du vivant et de l' environnement (AgroParisTech), Laboratoire de Chimie Analytique, 16, rue Claude Bernard, 75231 Paris Cedex 05 (France); Losno, R. [UMR 7583, LISA, Universités Paris 7-Paris 12 — CNRS, 61 av. du Gal de Gaulle, 94010 Créteil Cedex (France); Oort, F. van [UR 251, Pessac, Institut National de la Recherche Agronomique, Centre de Versailles-Grignon, RD 10, 78026 Versailles Cedex (France); Labanowski, J. [UMR 7285, IC2MP, Université de Poitiers — CNRS, 4, rue Michel Brunet, 86022 Poitiers (France); Perreira, A. [UMR 6298, ArTeHiS, Université de Bourgogne — CNRS — Culture, 6 bd Gabriel, Bat. Gabriel, 21000 Dijon (France); Chateau, C. [UFR SVTE, Université de Bourgogne, 6 bd Gabriel, Bat. Gabriel, 21000 Dijon (France); Alibert, P. [UMR 6282, Biogeosciences, Université de Bourgogne — CNRS, 6 bd Gabriel, Bat. Gabriel, 21000 Dijon (France)

    2014-02-01

    The aim of this study is to estimate the long-term behaviour of trace metals, in two soils differently impacted by past mining. Topsoils from two 1 km{sup 2} zones in the forested Morvan massif (France) were sampled to assess the spatial distribution of Cd, Cu, Pb and Zn. The first zone had been contaminated by historical mining. As expected, it exhibits higher trace-metal levels and greater spatial heterogeneity than the second non-contaminated zone, supposed to represent the local background. One soil profile from each zone was investigated in detail to estimate metal behaviour, and hence, bioavailability. Kinetic extractions were performed using EDTA on three samples: the A horizon from both soil profiles and the B horizon from the contaminated soil. For all three samples, kinetic extractions can be modelled by two first-order reactions. Similar kinetic behaviour was observed for all metals, but more metal was extracted from the contaminated A horizon than from the B horizon. More surprising is the general predominance of the residual fraction over the “labile” and “less labile” pools. Past anthropogenic inputs may have percolated over time through the soil profiles because of acidic pH conditions. Stable organo-metallic complexes may also have been formed over time, reducing metal availability. These processes are not mutually exclusive. After kinetic extraction, the lead isotopic compositions of the samples exhibited different signatures, related to contamination history and intrinsic soil parameters. However, no variation in lead signature was observed during the extraction experiment, demonstrating that the “labile” and “less labile” lead pools do not differ in terms of origin. Even if trace metals resulting from past mining and metallurgy persist in soils long after these activities have ceased, kinetic extractions suggest that metals, at least for these particular forest soils, do not represent a threat for biota. - Highlights: • Trace

  11. Ground Deformation Extraction Using Visible Images and LIDAR Data in Mining Area

    Science.gov (United States)

    Hu, Wenmin; Wu, Lixin

    2016-06-01

    Recognition and extraction of mining ground deformation can help us understand the deformation process and space distribution, and estimate the deformation laws and trends. This study focuses on the application of ground deformation detection and extraction combining with high resolution visible stereo imagery, LiDAR observation point cloud data and historical data. The DEM in large mining area is generated using high-resolution satellite stereo images, and ground deformation is obtained through time series analysis combined with historical DEM data. Ground deformation caused by mining activities are detected and analyzed to explain the link between the regional ground deformation and local deformation. A district of covering 200 km2 around the West Open Pit Mine in Fushun of Liaoning province, a city located in the Northeast China is chosen as the test area for example. Regional and local ground deformation from 2010 to 2015 time series are detected and extracted with DEMs derived from ZY-3 images and LiDAR point DEMs in the case study. Results show that the mean regional deformation is 7.1 m of rising elevation with RMS 9.6 m. Deformation of rising elevation and deformation of declining elevation couple together in local area. The area of higher elevation variation is 16.3 km2 and the mean rising value is 35.8 m with RMS 15.7 m, while the deformation area of lower elevation variation is 6.8 km2 and the mean declining value is 17.6 m with RMS 9.3 m. Moreover, local large deformation and regional slow deformation couple together, the deformation in local mining activities has expanded to the surrounding area, a large ground fracture with declining elevation has been detected and extracted in the south of West Open Pit Mine, the mean declining elevation of which is 23.1 m and covering about 2.3 km2 till 2015. The results in this paper are preliminary currently; we are making efforts to improve more precision results with invariant ground control data for validation.

  12. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection.

    Science.gov (United States)

    Botsis, Taxiarchis; Nguyen, Michael D; Woo, Emily Jane; Markatou, Marianthi; Ball, Robert

    2011-01-01

    The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload. We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (N(pos)=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations. Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed. Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively). Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably.

  13. Sequential Extraction Results and Mineralogy of Mine Waste and Stream Sediments Associated With Metal Mines in Vermont, Maine, and New Zealand

    Science.gov (United States)

    Piatak, N.M.; Seal, R.R.; Sanzolone, R.F.; Lamothe, P.J.; Brown, Z.A.; Adams, M.

    2007-01-01

    We report results from sequential extraction experiments and the quantitative mineralogy for samples of stream sediments and mine wastes collected from metal mines. Samples were from the Elizabeth, Ely Copper, and Pike Hill Copper mines in Vermont, the Callahan Mine in Maine, and the Martha Mine in New Zealand. The extraction technique targeted the following operationally defined fractions and solid-phase forms: (1) soluble, adsorbed, and exchangeable fractions; (2) carbonates; (3) organic material; (4) amorphous iron- and aluminum-hydroxides and crystalline manganese-oxides; (5) crystalline iron-oxides; (6) sulfides and selenides; and (7) residual material. For most elements, the sum of an element from all extractions steps correlated well with the original unleached concentration. Also, the quantitative mineralogy of the original material compared to that of the residues from two extraction steps gave insight into the effectiveness of reagents at dissolving targeted phases. The data are presented here with minimal interpretation or discussion and further analyses and interpretation will be presented elsewhere.

  14. Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts.

    Science.gov (United States)

    Cellier, Peggy; Charnois, Thierry; Plantevit, Marc; Rigotti, Christophe; Crémilleux, Bruno; Gandrillon, Olivier; Kléma, Jiří; Manguin, Jean-Luc

    2015-01-01

    Discovering gene interactions and their characterizations from biological text collections is a crucial issue in bioinformatics. Indeed, text collections are large and it is very difficult for biologists to fully take benefit from this amount of knowledge. Natural Language Processing (NLP) methods have been applied to extract background knowledge from biomedical texts. Some of existing NLP approaches are based on handcrafted rules and thus are time consuming and often devoted to a specific corpus. Machine learning based NLP methods, give good results but generate outcomes that are not really understandable by a user. We take advantage of an hybridization of data mining and natural language processing to propose an original symbolic method to automatically produce patterns conveying gene interactions and their characterizations. Therefore, our method not only allows gene interactions but also semantics information on the extracted interactions (e.g., modalities, biological contexts, interaction types) to be detected. Only limited resource is required: the text collection that is used as a training corpus. Our approach gives results comparable to the results given by state-of-the-art methods and is even better for the gene interaction detection in AIMed. Experiments show how our approach enables to discover interactions and their characterizations. To the best of our knowledge, there is few methods that automatically extract the interactions and also associated semantics information. The extracted gene interactions from PubMed are available through a simple web interface at https://bingotexte.greyc.fr/. The software is available at https://bingo2.greyc.fr/?q=node/22.

  15. Geographical Information System Model for Potential Mines Data Management Presentation in Kabupaten Gorontalo

    Science.gov (United States)

    Roviana, D.; Tajuddin, A.; Edi, S.

    2017-03-01

    Mining potential in Indonesian is very abundant, ranging from Sabang to Marauke. Kabupaten Gorontalo is one of many places in Indonesia that have different types of minerals and natural resources that can be found in every district. The abundant of mining potential must be balanced with good management and ease of getting information by investors. The current issue is, (1) ways of presenting data/information about potential mines area is still manually (the maps that already capture from satellite image, then printed and attached to information board in the office) it caused the difficulties of getting information; (2) the high cost of maps printing; (3) the difficulties of regency leader (bupati) to obtain information for strategic decision making about mining potential. The goal of this research is to build a model of Geographical Information System that could provide data management of potential mines, so that the investors could easily get information according to their needs. To achieve that goal Research and Development method is used. The result of this research, is a model of Geographical Information System that implemented in an application to presenting data management of mines.

  16. Summary of fish and wildlife information needs to surface mine coal in the United States. Part 3. A handbook for meeting fish and wildlife information needs to surface mine coal: OSM Region V. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Hinkle, C.R.; Ambrose, R.E.; Wenzel, C.R.

    1981-02-01

    This report contains information to assist in protecting, enhancing, and reducing impacts to fish and wildlife resources during surface mining of coal. It gives information on the premining, mining, reclamation and compliance phases of surface mining. This volume is specifically for the states of Washington, Idaho, Montana, North Dakota, South Dakota, Wyoming, Oregon, California, Nevada, Utah, Colorado, Arizona and New Mexico.

  17. Biomedical Information Extraction: Mining Disease Associated Genes from Literature

    Science.gov (United States)

    Huang, Zhong

    2014-01-01

    Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to…

  18. Biomedical Information Extraction: Mining Disease Associated Genes from Literature

    Science.gov (United States)

    Huang, Zhong

    2014-01-01

    Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to…

  19. Respiratory Information Extraction from Electrocardiogram Signals

    KAUST Repository

    Amin, Gamal El Din Fathy

    2010-12-01

    The Electrocardiogram (ECG) is a tool measuring the electrical activity of the heart, and it is extensively used for diagnosis and monitoring of heart diseases. The ECG signal reflects not only the heart activity but also many other physiological processes. The respiratory activity is a prominent process that affects the ECG signal due to the close proximity of the heart and the lungs. In this thesis, several methods for the extraction of respiratory process information from the ECG signal are presented. These methods allow an estimation of the lung volume and the lung pressure from the ECG signal. The potential benefit of this is to eliminate the corresponding sensors used to measure the respiration activity. A reduction of the number of sensors connected to patients will increase patients’ comfort and reduce the costs associated with healthcare. As a further result, the efficiency of diagnosing respirational disorders will increase since the respiration activity can be monitored with a common, widely available method. The developed methods can also improve the detection of respirational disorders that occur while patients are sleeping. Such disorders are commonly diagnosed in sleeping laboratories where the patients are connected to a number of different sensors. Any reduction of these sensors will result in a more natural sleeping environment for the patients and hence a higher sensitivity of the diagnosis.

  20. Technique of coal mining and gas extraction without coal pillar in multi-seam with low permeability

    Energy Technology Data Exchange (ETDEWEB)

    Liang Yuan [Huainan Mining (Group) Co. Ltd., Huainan (China)

    2009-06-15

    Aimed at the low mining efficiency in deep multi-seams because of high crustal stress, high gas content, low permeability, the compound 'three soft' roof and the troublesome safety situation encountered in deep level coal exploitation, a new idea of gob-side retaining without a coal-pillar and Y-style ventilation in the first-mined key pressure-relieved coal seam and a new method of coal mining and gas extraction was proposed. The following were discovered: the dynamic evolution law of the crannies in the roof is influenced by mining, the formative rule of 'the vertical cranny-abundant area' along the gob-side, the distribution of air pressure field in the gob, and the flowing rule of pressure-relieved gas in a Y-style ventilation system. The study also established a theoretic basis for a new mining method of coal mining and gas extraction which is used to extract the pressure-relieved gas by roadway retaining boreholes instead of roadway boreholes. Studied and resolved many difficult key problems, such as, fast roadway retaining at the gob-side without a coalpillar, Y-style ventilation and extraction of pressure-relieved gas by roadway retaining boreholes, and so on. The study innovated and integrated a whole set of technical systems for coal and pressure relief gas extraction. The method of the pressure-relieved gas extraction by roadway retaining had been successfully applied in 6 typical working faces in the Huainan and Huaibei mining areas. The research can provide a scientific and reliable technical support and a demonstration for coal mining and gas extraction in gaseous deep multi-seams with low permeability. 9 refs., 7 figs.

  1. Technique of coal mining and gas extraction without coal pillar in multi-seam with low permeability

    Institute of Scientific and Technical Information of China (English)

    YUAN Liang

    2009-01-01

    Aimed at the low mining efficiency in deep multi-seams because of high crustal stress, high gas content, low permeability, the compound "three soft" roof and the trouble-some safety situation encountered in deep level coal exploitation, proposed a new idea of gob-side retaining without a coal-pillar and Y-style ventilation in the first-mined key pres-sure-relieved coal seam and a new method of coal mining and gas extraction. The follow-ing were discovered: the dynamic evolution law of the crannies in the roof is influenced by mining, the formative rule of "the vertical cranny-abundant area" along the gob-side, the distribution of air pressure field in the gob, and the flowing rule of pressure-relieved gas in a Y-style ventilation system. The study also established a theoretic basis for a new mining method of coal mining and gas extraction which is used to extract the pressure-relieved gas by roadway retaining boreholes instead of roadway boreholes. Studied and resolved many difficult key problems, such as, fast roadway retaining at the gob-side without a coal-pillar, Y-style ventilation and extraction of pressure-relieved gas by roadway retaining boreholes, and so on. The study innovated and integrated a whole set of technical sys-tems for coal and pressure relief gas extraction. The method of the pressure-relieved gas extraction by roadway retaining had been successfully applied in 6 typical working faces in the Huainan and Huaibei mining areas. The research can provide a scientific and reliable technical support and a demonstration for coal mining and gas extraction in gaseous deep multi-seams with low permeability.

  2. Personal continuous route pattern mining

    Institute of Scientific and Technical Information of China (English)

    Qian YE; Ling CHEN; Gen-cai CHEN

    2009-01-01

    In the daily life, people often repeat regular routes in certain periods. In this paper, a mining system is developed to find the continuous route patterns of personal past trips. In order to count the diversity of personal moving status, the mining system employs the adaptive GPS data recording and five data filters to guarantee the clean trips data. The mining system uses a client/server architecture to protect personal privacy and to reduce the computational load. The server conducts the main mining procedure but with insufficient information to recover real personal routes. In order to improve the scalability of sequential pattern mining, a novel pattern mining algorithm, continuous route pattern mining (CRPM), is proposed. This algorithm can tolerate the different disturbances in real routes and extract the frequent patterns. Experimental results based on nine persons' trips show that CRPM can extract more than two times longer route patterns than the traditional route pattern mining algorithms.

  3. Method for Extracting Product Information from TV Commercial

    Directory of Open Access Journals (Sweden)

    Kohei Arai

    2011-09-01

    Full Text Available Television (TV Commercial program contains important product information that displayed only in seconds. People who need that information has no insufficient time for noted it, even just for reading that information. This research work focus on automatically detect text and extract important information from a TV commercial to provide information in real time and for video indexing. We propose method for product information extraction from TV commercial using knowledge based system with pattern matching rule based method. Implementation and experiments on 50 commercial screenshot images achieved a high accuracy result on text extraction and information recognition.

  4. Research on Key Technology of Mining Remote Sensing Dynamic Monitoring Information System

    Science.gov (United States)

    Sun, J.; Xiang, H.

    2017-09-01

    Problems exist in remote sensing dynamic monitoring of mining are expounded, general idea of building remote sensing dynamic monitoring information system is presented, and timely release of service-oriented remote sensing monitoring results is established. Mobile device-based data verification subsystem is developed using mobile GIS, remote sensing dynamic monitoring information system of mining is constructed, and "timely release, fast handling and timely feedback" rapid response mechanism of remote sensing dynamic monitoring is implemented.

  5. Information Extraction on the Web with Credibility Guarantee

    OpenAIRE

    Nguyen, Thanh Tam

    2015-01-01

    The Web became the central medium for valuable sources of information extraction applications. However, such user-generated resources are often plagued by inaccuracies and misinformation due to the inherent openness and uncertainty of the Web. In this work we study the problem of extracting structured information out of Web data with a credibility guarantee. The ultimate goal is that not only the structured information should be extracted as much as possible but also its credibility is high. ...

  6. Information Extraction from Large-Multi-Layer Social Networks

    Science.gov (United States)

    2015-08-06

    paper we introduce a novel method to extract information from such multi-layer networks, where each type of link forms its own layer. Using the concept...Approved for public release; distribution is unlimited. Information extraction from large-multi-layer social networks The views, opinions and/or findings...Information extraction from large-multi-layer social networks Report Title Social networks often encode community structure using multiple distinct

  7. Automatic extraction of reference gene from literature in plants based on texting mining.

    Science.gov (United States)

    He, Lin; Shen, Gengyu; Li, Fei; Huang, Shuiqing

    2015-01-01

    Real-Time Quantitative Polymerase Chain Reaction (qRT-PCR) is widely used in biological research. It is a key to the availability of qRT-PCR experiment to select a stable reference gene. However, selecting an appropriate reference gene usually requires strict biological experiment for verification with high cost in the process of selection. Scientific literatures have accumulated a lot of achievements on the selection of reference gene. Therefore, mining reference genes under specific experiment environments from literatures can provide quite reliable reference genes for similar qRT-PCR experiments with the advantages of reliability, economic and efficiency. An auxiliary reference gene discovery method from literature is proposed in this paper which integrated machine learning, natural language processing and text mining approaches. The validity tests showed that this new method has a better precision and recall on the extraction of reference genes and their environments.

  8. Extraction and Network Sharing of Forest Vegetation Information based on SVM

    Directory of Open Access Journals (Sweden)

    Zhang Hannv

    2013-05-01

    Full Text Available The support vector machine (SVM is a new method of data mining, which can deal with regression problems (time series analysis, pattern recognition (classification, discriminant analysis and many other issues very well. In recent years, SVM has been widely used in computer classification and recognition of remote sensing images. This paper is based on Landsat TM image data, using a classification method which is based on support vector machine to extract the forest cover information of Dahuanggou tree farm of Changbai Mountain area, and compare with the conventional maximum likelihood classification. The results show that extraction accuracy of forest information based on support vector machine, Kappa values are 0.9810, 0.9716, 0.9753, which are exceeding the extraction accuracy of maximum likelihood method (MLC and Kappa value of 0.9634, the method has good maneuverability and practicality.

  9. The study of personalized information recommendation system based on data mining

    Science.gov (United States)

    Chen, Ke; Ke, Wende; Li, Sansi

    2011-12-01

    For the current Internet information access of contradictions and difficulties, the study on the basis of the data mining technique and recommender system, propose and implement a facing internet personalization information recommendation system based on data mining. The system is divided into offline and online, offline part to complete the from the site server log files access the appropriate online intelligent personalized recommendation service transaction mode, using the association rules mining. The online part, realizes personalized intelligence recommendation service based on the connection rule excavation. Provides the personalization information referral service method based mining association rules, And through the experiment to this system has carried on the test, has confirmed this system's feasibility and the validity.

  10. A COMPARATIVE ANALYSIS OF WEB INFORMATION EXTRACTION TECHNIQUES DEEP LEARNING vs. NAÏVE BAYES vs. BACK PROPAGATION NEURAL NETWORKS IN WEB DOCUMENT EXTRACTION

    OpenAIRE

    J. Sharmila; Subramani, A.

    2016-01-01

    Web mining related exploration is getting the chance to be more essential these days in view of the reason that a lot of information is overseen through the web. Web utilization is expanding in an uncontrolled way. A particular framework is required for controlling such extensive measure of information in the web space. Web mining is ordered into three noteworthy divisions: Web content mining, web usage mining and web structure mining. Tak-Lam Wong has proposed a web content mining methodolog...

  11. Data mining

    CERN Document Server

    Gorunescu, Florin

    2011-01-01

    The knowledge discovery process is as old as Homo sapiens. Until some time ago, this process was solely based on the 'natural personal' computer provided by Mother Nature. Fortunately, in recent decades the problem has begun to be solved based on the development of the Data mining technology, aided by the huge computational power of the 'artificial' computers. Digging intelligently in different large databases, data mining aims to extract implicit, previously unknown and potentially useful information from data, since 'knowledge is power'. The goal of this book is to provide, in a friendly way

  12. Heavy metal concentration in forage grasses and extractability from some acid mine spoils

    Energy Technology Data Exchange (ETDEWEB)

    Taylor, R.W.; Ibeabuchi, I.O.; Sistani, K.R.; Shuford, J.W. (Alabama A and M University, Normal (United States). Department of Plant and Soil Science)

    1993-06-01

    Laboratory and greenhouse studies were conducted on several forage grasses, bermudagrass ([ital Cynodon dactylon]), creeping red fescue ([ital Festuca rubra]), Kentucky 31-tall fescue ([ital Festuca arundinacea]), oat ([ital Avena sativa]), orchardgrass ([ital Dactylis glomerata]), perennial ryegrass ([ital Lolium perenne]), sorghum ([ital Sorghum bicolor]), triticale (X. [ital triticosecale Wittmack]), and winter wheat ([ital Triticum aestivum]) grown on three Alabama acid mine spoils to study heavy metal accumulation, dry matter yield and spoil metal extractability by three chemical extractants (Mehlich 1, DTPA, and 0.1 M HCl). Heavy metals removed by these extractants were correlated with their accumulation by several forage grasses. Among the forages tested, creeping red fescue did not survive the stressful conditions of any of the spoils, while orchard grass and Kentucky 31-tall fescue did not grow in Mulberry spoil. Sorghum followed by bermudagrass generally produced the highest dry matter yield. However, the high yielding bermudagrass was most effective in accumulating high tissue levels of Mn and Zn from all spoils (compared to the other grasses) but did not remove Ni. On the average, higher levels of metals were extracted from spoils in the order of 0.1 M HCl[gt] Mehlich 1[gt] DTPA. However, DTPA extracted all the metals from spoils while Mehlich 1 did not extract Pb and 0.1 M HCl did not extract detectable levels of Ni. All of the extractants were quite effective in determining plant available Zn from the spoils. For the other metals, the effective determination of plant availability depended on the crop, the extractant, and the metal in concert. 20 refs., 6 tabs.

  13. Mining and Analyzing Circulation and ILL Data for Informed Collection Development

    Science.gov (United States)

    Link, Forrest E.; Tosaka, Yuji; Weng, Cathy

    2015-01-01

    The authors investigated quantitative methods of collection use analysis employing library data that are available in ILS and ILL systems to better understand library collection use and user needs. For the purpose of the study, the authors extracted circulation and ILL records from the library's systems using data-mining techniques. By comparing…

  14. WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK – AN OVERVIEW

    OpenAIRE

    V.Lakshmi Praba; T. Vasantha

    2011-01-01

    Web Mining is the extraction of interesting and potentially useful patterns and information from Web. It includes Web documents, hyperlinks between documents, and usage logs of web sites. The significant task for web mining can be listed out as Information Retrieval, Information Selection / Extraction, Generalization and Analysis. Web information retrieval tools consider only the text on pages and ignore information in the links. The goal of Web structure mining is to explore structural summa...

  15. Association Rule Mining from an Intelligent Tutor

    Science.gov (United States)

    Dogan, Buket; Camurcu, A. Yilmaz

    2008-01-01

    Educational data mining is a very novel research area, offering fertile ground for many interesting data mining applications. Educational data mining can extract useful information from educational activities for better understanding and assessment of the student learning process. In this way, it is possible to explore how students learn topics in…

  16. The theories and key technologies for the new generation mine wireless information system

    Energy Technology Data Exchange (ETDEWEB)

    Yang, W.; Feng, X.; Cheng, S.; Sun, J. [Beijing Jiaotong University, Beijing (China). Key Laboratory of ARP Optical Network and Advanced Telecommunication Network

    2004-07-01

    Breaking through the traditional mine wireless communication theories and technologies, combining advanced wireless communication technologies, wireless network technologies with optical fiber communication technologies have been proposed to construct a new generation mine wireless information system. This has a full range of functions such as managing mobile communications, vehicle positioning and navigation, personnel positioning and tracing, wireless multimedia surveillance, mobile computing and mine environment parameters monitoring. The relevant theories and key technologies were proposed. The urgency to do research work for China is stressed. 10 refs., 2 figs.

  17. Safety Psychology Applicating on Coal Mine Safety Management Based on Information System

    Science.gov (United States)

    Hou, Baoyue; Chen, Fei

    In recent years, with the increase of intensity of coal mining, a great number of major accidents happen frequently, the reason mostly due to human factors, but human's unsafely behavior are affected by insecurity mental control. In order to reduce accidents, and to improve safety management, with the help of application security psychology, we analyse the cause of insecurity psychological factors from human perception, from personality development, from motivation incentive, from reward and punishment mechanism, and from security aspects of mental training , and put forward countermeasures to promote coal mine safety production,and to provide information for coal mining to improve the level of safety management.

  18. Sequential extraction of heavy metals in soils from a copper mine

    Science.gov (United States)

    Arenas, Daniel; Lago, Manoel; Vega, Flora; Andrade, Luisa

    2013-04-01

    Metal mining produces a large amount of waste materials where mine soils can be formed. They use to have important limitations for plant development like extreme pH and low organic matter among others. On metal mines they usually have problems of pollution by heavy metals (Asensio et al., 2013) generally concerning more than one metal. At Touro (Galicia, Spain) copper was mining from 1973 to 1988. Nowadays, there are soils formed on the tailings formed with waste and thick materials coming from copper extraction and on the settling pond since it is emerged and dry. They are partly exposed to weathering and the iron, copper, sulphides and H+ can be released causing acid mine drainage and heavy metal solubilization. Since heavy metals can adsorb onto the soil, runoff into rivers or lakes or leach in the groundwater (Mulligan et al., 2001) it is very important to study the soils mechanisms involved in both retention and solubility of heavy metals. The sequential extraction procedures allow to better understand them since the chosen extractions attempt to minimize solubilization of other soil fractions even none of them is completely specific (Mulligan et al., 2001). At Touro mine, five soils were sampled and analysed for those properties known as heavy metal retention determiners. The distribution of Cr, Cu, Mn, Ni, Pb and Zn among geochemical soil phases was analysed following the modified sequential extraction technique of Shuman (1979, 1985). The concentration in the extractions was analysed by ICP-OES. The results show that most of the heavy metal content is associated to the residual fraction in all soils Cr (85-92%), Cu (53-81%), Mn (80-98%), Ni (86-96%), Pb (47-81%) and Zn (85-95%). The high crystalline Fe-oxides content also plays an important role, specially for Cu (18-22% of the total Cu). The amount of heavy metals associated to soil organic matter is very low (Pb and Cu: heavy metal contents are strongly retained in low accessible soil fractions. Still

  19. Sample-based XPath Ranking for Web Information Extraction

    NARCIS (Netherlands)

    Jundt, Oliver; van Keulen, Maurice

    Web information extraction typically relies on a wrapper, i.e., program code or a configuration that specifies how to extract some information from web pages at a specific website. Manually creating and maintaining wrappers is a cumbersome and error-prone task. It may even be prohibitive as some

  20. miRTex: A Text Mining System for miRNA-Gene Relation Extraction.

    Science.gov (United States)

    Li, Gang; Ross, Karen E; Arighi, Cecilia N; Peng, Yifan; Wu, Cathy H; Vijay-Shanker, K

    2015-01-01

    MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana. These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes.

  1. Information and communication technology and climate change adaptation: Evidence from selected mining companies in South Africa

    OpenAIRE

    Bartholomew I. Aleke; Godwell Nhamo

    2016-01-01

    The mining sector is a significant contributor to the gross domestic product of many global economies. Given the increasing trends in climate-induced disasters and the growing desire to find lasting solutions, information and communication technology (ICT) has been introduced into the climate change adaptation mix. Climate change-induced extreme weather events such as flooding, drought, excessive fog, and cyclones have compounded the environmental challenges faced by the mining sector. T...

  2. Pressure-relief and methane production performance of pressure relief gas extraction technology in the longwall mining

    Science.gov (United States)

    Zhang, Cun; Tu, Shihao; Chen, Min; Zhang, Lei

    2017-02-01

    Pressure relief gas extraction technology (PRGET) has been successfully implemented at many locations as a coal mine methane exploitation and outburst prevention technology. Comprehensive PRGET including gob gas venthole (GGV), crossing seam drilling hole (CSDH), large diameter horizontal long drilling hole (LDHLDH) and buried pipe for extraction (BPE) have been used to extract abundant pressure-relief methane (PRM) during protective coal seam mining; these techniques mitigated dangers associated with coal and gas outbursts in 13-1 coal seam mining in the Huainan coalfield. These extraction technologies can ensure safe protective seam mining and effectively extract coal and gas. This article analyses PRGET production performance and verifies it with the field measurement. The results showed that PRGET drilling to extract PRM from the protected coal seam significantly reduced methane emissions from a longwall ventilation system and produced highly efficient extraction. Material balance analyses indicated a significant decrease in gas content and pressure in the protected coal seam, from 8.78 m3 t-1 and 4.2 MPa to 2.34 m3 t-1 and 0.285 MPa, respectively. The field measurement results of the residual gas content in protected coal seam (13-1 coal seam) indicated the reliability of the material balance analyses and the pressure relief range of PRGET in the protected coal seam is obtained.

  3. Nickel solvent extraction from cold purification filter cakes of Angouran mine concentrate using LIX984N

    Institute of Scientific and Technical Information of China (English)

    AA Balesini; A Zakeri; H Razavizadeh; A Khani

    2013-01-01

    Cold purification filter cakes generated in the hydrometallurgical processing of Angouran mine zinc concentrate commonly contain significant amounts of Zn, Cd, and Ni ions and thus are valuable resources for metal recovery. In this research, a nickel containing solution that was obtained from sulfuric acid leaching of the filter cake following cadmium and zinc removal was subjected to solvent extraction experiments using 10vol%LIX984N diluted in kerosene. Under optimum experimental conditions (pH 5.3, volume ratio of organic/aqueous (O:A) = 2:1, and contact time = 5 min), more than 97.1% of nickel was extracted. Nickel was stripped from the loaded organic by contacting with a 200 g/L sulfuric acid solution, from which 77.7% of nickel was recovered in a single contact at the optimum conditions (pH 1-1.5, O:A = 5:1, and contact time=15 min).

  4. The Agent of extracting Internet Information with Lead Order

    Science.gov (United States)

    Mo, Zan; Huang, Chuliang; Liu, Aijun

    In order to carry out e-commerce better, advanced technologies to access business information are in need urgently. An agent is described to deal with the problems of extracting internet information that caused by the non-standard and skimble-scamble structure of Chinese websites. The agent designed includes three modules which respond to the process of extracting information separately. A method of HTTP tree and a kind of Lead algorithm is proposed to generate a lead order, with which the required web can be retrieved easily. How to transform the extracted information structuralized with natural language is also discussed.

  5. Research into comprehensive gas extraction technology of single coal seams with low permeability in the Jiaozuo coal mining area

    Institute of Scientific and Technical Information of China (English)

    Fu Jiangwei; Fu Xuehai; Hu Xiao; Chen Li; Ou Jianchun

    2011-01-01

    For a low permeability single coal seam prone to gas outbursts,pre-drainage of gas is difficult and inefficient,seriously restricting the safety and efficiency of production.Radical measures of increasing gas extraction efficiency are pressure relief and infrared antireflection.We have analyzed the effect of mining conditions and the regularity of mine pressure distribution in front of the working face of a major coal mine of the Jiaozuo Industrial (Group) Co.as our test area,studied the width of the depressurization zone in slice mining and analyzed gas efficiency and fast drainage in the advanced stress relaxation zone.On that basis,we further investigated and practiced the exploitation technology of shallow drilling,fan drilling and grid shape drilling at the working face.Practice and our results show that the stress relaxation zone is the ideal region for quick and efficient extraction of gas.By means of an integrated extraction technology,the amount of gas emitted into the zone was greatly reduced,while the risk of dangerous outbursts of coal and gas was lowered markedly.This exploration provides a new way to control for gas in working faces of coal mines with low permeability and risk of gas outbursts of single coal seams in the Jiaozuo mining area.

  6. A COMPARATIVE ANALYSIS OF WEB INFORMATION EXTRACTION TECHNIQUES DEEP LEARNING vs. NAÏVE BAYES vs. BACK PROPAGATION NEURAL NETWORKS IN WEB DOCUMENT EXTRACTION

    Directory of Open Access Journals (Sweden)

    J. Sharmila

    2016-01-01

    Full Text Available Web mining related exploration is getting the chance to be more essential these days in view of the reason that a lot of information is overseen through the web. Web utilization is expanding in an uncontrolled way. A particular framework is required for controlling such extensive measure of information in the web space. Web mining is ordered into three noteworthy divisions: Web content mining, web usage mining and web structure mining. Tak-Lam Wong has proposed a web content mining methodology in the exploration with the aid of Bayesian Networks (BN. In their methodology, they were learning on separating the web data and characteristic revelation in view of the Bayesian approach. Roused from their investigation, we mean to propose a web content mining methodology, in view of a Deep Learning Algorithm. The Deep Learning Algorithm gives the interest over BN on the basis that BN is not considered in any learning architecture planning like to propose system. The main objective of this investigation is web document extraction utilizing different grouping algorithm and investigation. This work extricates the data from the web URL. This work shows three classification algorithms, Deep Learning Algorithm, Bayesian Algorithm and BPNN Algorithm. Deep Learning is a capable arrangement of strategies for learning in neural system which is connected like computer vision, speech recognition, and natural language processing and biometrics framework. Deep Learning is one of the simple classification technique and which is utilized for subset of extensive field furthermore Deep Learning has less time for classification. Naive Bayes classifiers are a group of basic probabilistic classifiers in view of applying Bayes hypothesis with concrete independence assumptions between the features. At that point the BPNN algorithm is utilized for classification. Initially training and testing dataset contains more URL. We extract the content presently from the dataset. The

  7. Pattern information extraction from crystal structures

    OpenAIRE

    Okuyan, Erhan

    2005-01-01

    Cataloged from PDF version of article. Determining crystal structure parameters of a material is a quite important issue in crystallography. Knowing the crystal structure parameters helps to understand physical behavior of material. For complex structures, particularly for materials which also contain local symmetry as well as global symmetry, obtaining crystal parameters can be quite hard. This work provides a tool that will extract crystal parameters such as primitive vect...

  8. Undermining the state? Informal mining and trajectories of state formation in Eastern Mindanao, Philippines

    NARCIS (Netherlands)

    Verbrugge, B.L.P.

    2015-01-01

    Building on critical perspectives on the state and the informal economy, this article provides an analysis of the “state of the state” on the eastern Mindanao mineral frontier. In the first instance, the author explains that the massive expansion of informal small-scale gold mining, instead of under

  9. Text mining of cancer-related information: review of current status and future directions.

    Science.gov (United States)

    Spasić, Irena; Livsey, Jacqueline; Keane, John A; Nenadić, Goran

    2014-09-01

    This paper reviews the research literature on text mining (TM) with the aim to find out (1) which cancer domains have been the subject of TM efforts, (2) which knowledge resources can support TM of cancer-related information and (3) to what extent systems that rely on knowledge and computational methods can convert text data into useful clinical information. These questions were used to determine the current state of the art in this particular strand of TM and suggest future directions in TM development to support cancer research. A review of the research on TM of cancer-related information was carried out. A literature search was conducted on the Medline database as well as IEEE Xplore and ACM digital libraries to address the interdisciplinary nature of such research. The search results were supplemented with the literature identified through Google Scholar. A range of studies have proven the feasibility of TM for extracting structured information from clinical narratives such as those found in pathology or radiology reports. In this article, we provide a critical overview of the current state of the art for TM related to cancer. The review highlighted a strong bias towards symbolic methods, e.g. named entity recognition (NER) based on dictionary lookup and information extraction (IE) relying on pattern matching. The F-measure of NER ranges between 80% and 90%, while that of IE for simple tasks is in the high 90s. To further improve the performance, TM approaches need to deal effectively with idiosyncrasies of the clinical sublanguage such as non-standard abbreviations as well as a high degree of spelling and grammatical errors. This requires a shift from rule-based methods to machine learning following the success of similar trends in biological applications of TM. Machine learning approaches require large training datasets, but clinical narratives are not readily available for TM research due to privacy and confidentiality concerns. This issue remains the main

  10. Data Mining – Innovative Method for Obtaining Information in Marketingand Business Management

    Directory of Open Access Journals (Sweden)

    Mirela-Cristina Voicu

    2011-05-01

    Full Text Available The existence of massive amounts of data raised the question of using their reorientation to a retrospective to a prospective operation. Data mining offers the promise of an important aid for discovering hidden patterns in data that can be used to predict the behavior of customers, products and processes. Data mining tools must be guided by users who understand the business, the general nature of the data and analytical methods involved. It discovers information within the data that queries and reports can’t effectively reveal. It is vital to collect data and prepare properly, to face reality models. Choosing the most appropriate product data mining is to find a tool with the capabilities required, an interface that matches the skills of users and can be applied in a specific business problem. In this context, the purpose of this paper is to illustrate some of the problems of company activity problems which can be solved by using data mining techniques.

  11. A Computer-aided Application for Modeling and Monitoring Operational and Maintenance Information in Mining Trucks

    Science.gov (United States)

    Nikulin, Christopher; Ulloa, Andres; Carmona, Carlos; Creixell, Werner

    2016-09-01

    The combination of maintenance planning and key performance indicators are relevant to create a more holistic scenario of the mining activities. On the one hand, reliability and maintainability are system characteristics suitable for planning maintenance strategies. On the other hand, key performance indicators are suitable to analyze cost and resource consumption information about mining equipment. Nevertheless in practice, both approaches are modeled separately and frequently by different team-works of a mining company. With this in mind, a computer-aided application was conceived to drive with better efficacy the operational and maintenance strategy in a complex process where the equipment is in continuous movement such as the transportation process in an open-mine pit.

  12. A Novel Approach to Extracting Casing Status Features Using Data Mining

    Directory of Open Access Journals (Sweden)

    Jikai Chen

    2013-12-01

    Full Text Available Casing coupling location signals provided by the magnetic localizer in retractors are typically used to ascertain the position of casing couplings in horizontal wells. However, the casing coupling location signal is usually submerged in noise, which will result in the failure of casing coupling detection under the harsh logging environment conditions. The limitation of Shannon wavelet time entropy, in the feature extraction of casing status, is presented by analyzing its application mechanism, and a corresponding improved algorithm is subsequently proposed. On the basis of wavelet transform, two derivative algorithms, singular values decomposition and Tsallis entropy theory, are proposed and their physics meanings are researched. Meanwhile, a novel data mining approach to extract casing status features with Tsallis wavelet singularity entropy is put forward in this paper. The theoretical analysis and experiment results indicate that the proposed approach can not only extract the casing coupling features accurately, but also identify the characteristics of perforation and local corrosion in casings. The innovation of the paper is in the use of simple wavelet entropy algorithms to extract the complex nonlinear logging signal features of a horizontal well tractor.

  13. Real-Time Information Extraction from Big Data

    Science.gov (United States)

    2015-10-01

    I N S T I T U T E F O R D E F E N S E A N A L Y S E S Real-Time Information Extraction from Big Data Robert M. Rolfe...Information Extraction from Big Data Jagdeep Shah Robert M. Rolfe Francisco L. Loaiza-Lemos October 7, 2015 I N S T I T U T E F O R D E F E N S E...AN A LY S E S Abstract We are drowning under the 3 Vs (volume, velocity and variety) of big data . Real-time information extraction from big

  14. An architecture for biological information extraction and representation.

    Science.gov (United States)

    Vailaya, Aditya; Bluvas, Peter; Kincaid, Robert; Kuchinsky, Allan; Creech, Michael; Adler, Annette

    2005-02-15

    Technological advances in biomedical research are generating a plethora of heterogeneous data at a high rate. There is a critical need for extraction, integration and management tools for information discovery and synthesis from these heterogeneous data. In this paper, we present a general architecture, called ALFA, for information extraction and representation from diverse biological data. The ALFA architecture consists of: (i) a networked, hierarchical, hyper-graph object model for representing information from heterogeneous data sources in a standardized, structured format; and (ii) a suite of integrated, interactive software tools for information extraction and representation from diverse biological data sources. As part of our research efforts to explore this space, we have currently prototyped the ALFA object model and a set of interactive software tools for searching, filtering, and extracting information from scientific text. In particular, we describe BioFerret, a meta-search tool for searching and filtering relevant information from the web, and ALFA Text Viewer, an interactive tool for user-guided extraction, disambiguation, and representation of information from scientific text. We further demonstrate the potential of our tools in integrating the extracted information with experimental data and diagrammatic biological models via the common underlying ALFA representation. aditya_vailaya@agilent.com.

  15. Information Extraction from Unstructured Text for the Biodefense Knowledge Center

    Energy Technology Data Exchange (ETDEWEB)

    Samatova, N F; Park, B; Krishnamurthy, R; Munavalli, R; Symons, C; Buttler, D J; Cottom, T; Critchlow, T J; Slezak, T

    2005-04-29

    The Bio-Encyclopedia at the Biodefense Knowledge Center (BKC) is being constructed to allow an early detection of emerging biological threats to homeland security. It requires highly structured information extracted from variety of data sources. However, the quantity of new and vital information available from every day sources cannot be assimilated by hand, and therefore reliable high-throughput information extraction techniques are much anticipated. In support of the BKC, Lawrence Livermore National Laboratory and Oak Ridge National Laboratory, together with the University of Utah, are developing an information extraction system built around the bioterrorism domain. This paper reports two important pieces of our effort integrated in the system: key phrase extraction and semantic tagging. Whereas two key phrase extraction technologies developed during the course of project help identify relevant texts, our state-of-the-art semantic tagging system can pinpoint phrases related to emerging biological threats. Also we are enhancing and tailoring the Bio-Encyclopedia by augmenting semantic dictionaries and extracting details of important events, such as suspected disease outbreaks. Some of these technologies have already been applied to large corpora of free text sources vital to the BKC mission, including ProMED-mail, PubMed abstracts, and the DHS's Information Analysis and Infrastructure Protection (IAIP) news clippings. In order to address the challenges involved in incorporating such large amounts of unstructured text, the overall system is focused on precise extraction of the most relevant information for inclusion in the BKC.

  16. Source-specific Informative Prior for i-Vector Extraction

    DEFF Research Database (Denmark)

    Shepstone, Sven Ewan; Lee, Kong Aik; Li, Haizhou

    2015-01-01

    -informative, since for homogeneous datasets there is no gain in generality in using an informative prior. This work shows that extracting i-vectors for a heterogeneous dataset, containing speech samples recorded from multiple sources, using informative priors instead is applicable, and leads to favorable results...

  17. Concept and Establishment of the Mine Information System within the CROMAC GIP Project

    Directory of Open Access Journals (Sweden)

    Zvonko Biljecki

    2006-12-01

    Full Text Available In order to solve mine problems in the Republic of Croatia, a unique project CROMAC GIP (Croatian Mine Action Centre Geoinformation Project has been initiated significantly increasing the functional quality of the existing Mine Information System (MIS. Since mine problems are closely related to space, geodata are a crucial part of MIS intended for monitoring and planning of demining. Since the moment the Croatian Mine Action Centre was funded till today, the process of demining has progressed. The implementation of a topographic database in accordance with the CROTIS data model and the usage of orthophoto data produced according to the official product specifications can be pointed out in that progress. Usage of such geodata requires a sophisticated information system that enables a simultaneous usage of geodata and other data connected with solving mine problems. In order to reach all goals in demining and to use all advantages of geodata, it was indispensable to upgrade the existing Mine Information System by merging geodata and HCR data and to collect new data according to the standardized procedures, but controlling at the same time the quality and automated procedures of uploading into the system. Apart from being constructed in accordance with the Standard Operative Procedures (SOP, the modernised MIS is also based on generally accepted standards in the field of geoinformation and it is implemented on advanced technology. The core of the system is the Oracle database, and GeoMedia is a WebMap Professional tool on the basis of which the distribution and the work with spatial data is possible on intranet/Internet. In order to achieve full efficiency of the system, it is necessary to provide high quality and updated geodata. In this respect, photogrammetric data are the most efficient solution.

  18. Preparation and examination of monolithic in-needle extraction (MINE) device for the direct analysis of liquid samples

    Energy Technology Data Exchange (ETDEWEB)

    Pietrzyńska, Monika, E-mail: monikapietrzynska@gmail.com; Voelkel, Adam; Bielicka-Daszkiewicz, Katarzyna

    2013-05-07

    Graphical abstract: -- Highlights: •MINE device for isolation of analytes from water samples. •Nine polymer poly(styrene-divinylbenzene) monoliths prepared in stainless steel needles. •High efficiency of in-needle extraction systems based on monolithic materials. •New possibilities in sample preparation area. -- Abstract: Combination of extraction and chromatographic techniques opens NEW possibilities in sample preparation area. Macroporous poly(styrene-divinylbenzene) (PS-DVB) monoliths were prepared by in situ polymerization in stainless steel needles. The surface of stainless steel needle was modified earlier by the silane coupling agent. Monolithic materials located inside needles were used as the in-needle extraction device. Scanning electron microscope (SEM) images were obtained for nine monoliths. Spectra of prepared materials were also performed with the use of two techniques: Attenuated Total Reflectance (ATR) and Fourier Transform Infrared Spectroscopy (FTIR). The new monolithic in-needle extraction (MINE) devices were used in the preparation of a series of test water samples for chromatographic analysis. The extraction of phenolic compounds from water samples was carried out by pumping liquid samples through the MINE device. Obtained results indicate a high efficiency of in-needle extraction systems based on monolithic materials. Breakthrough volume and the sorption efficiency of prepared monolithic in-needle extraction devices were determined experimentally. The achieved recovery was close to 90%, and determined LOQ values varied between 0.4 and 6 μg.

  19. Can we replace curation with information extraction software?

    Science.gov (United States)

    Karp, Peter D

    2016-01-01

    Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL.

  20. Moving Target Information Extraction Based on Single Satellite Image

    Directory of Open Access Journals (Sweden)

    ZHAO Shihu

    2015-03-01

    Full Text Available The spatial and time variant effects in high resolution satellite push broom imaging are analyzed. A spatial and time variant imaging model is established. A moving target information extraction method is proposed based on a single satellite remote sensing image. The experiment computes two airplanes' flying speed using ZY-3 multispectral image and proves the validity of spatial and time variant model and moving information extracting method.

  1. Pattern information extraction from crystal structures

    Science.gov (United States)

    Okuyan, Erhan; Güdükbay, Uğur; Gülseren, Oğuz

    2007-04-01

    Determining the crystal structure parameters of a material is an important issue in crystallography and material science. Knowing the crystal structure parameters helps in understanding the physical behavior of material. It can be difficult to obtain crystal parameters for complex structures, particularly those materials that show local symmetry as well as global symmetry. This work provides a tool that extracts crystal parameters such as primitive vectors, basis vectors and space groups from the atomic coordinates of crystal structures. A visualization tool for examining crystals is also provided. Accordingly, this work could help crystallographers, chemists and material scientists to analyze crystal structures efficiently. Program summaryTitle of program: BilKristal Catalogue identifier: ADYU_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYU_v1_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions: None Programming language used: C, C++, Microsoft .NET Framework 1.1 and OpenGL Libraries Computer: Personal Computers with Windows operating system Operating system: Windows XP Professional RAM: 20-60 MB No. of lines in distributed program, including test data, etc.:899 779 No. of bytes in distributed program, including test date, etc.:9 271 521 Distribution format:tar.gz External routines/libraries: Microsoft .NET Framework 1.1. For visualization tool, graphics card driver should also support OpenGL Nature of problem: Determining crystal structure parameters of a material is a quite important issue in crystallography. Knowing the crystal structure parameters helps to understand physical behavior of material. For complex structures, particularly, for materials which also contain local symmetry as well as global symmetry, obtaining crystal parameters can be quite hard. Solution method: The tool extracts crystal parameters such as primitive vectors, basis vectors and identify the space group from

  2. Gold-Mining

    DEFF Research Database (Denmark)

    Raaballe, J.; Grundy, B.D.

    2002-01-01

    of operating gold mines. Asymmetric information on the reserves in the mine implies that, at a high enough price of gold, the manager of high type finds the extraction value of the company to be higher than the current market value of the non-operating gold mine. Due to this under valuation the maxim of market...... value maximization forces the manager of high type to extract the gold.The implications are three-fold. First, all managers (except the lowest type) extract the gold too soon compared to the first-best policy of leaving the gold in the mine forever. Second, a manager of high type extracts the gold...... sooner than a manager of lower type. Third, a non-operating gold mine is valued as being of the lowest type in the pool and all else equal, high-asymmetri mines are valued lower than low-asymmetri mines. In a qualitative sense these results are robust with respect to different assumptions (re cost...

  3. An effective method of DNA extraction for bioleaching bacteria from acid mine drainage.

    Science.gov (United States)

    Zeng, Leping; Huang, Jufang; Zhang, Yanfei; Qiu, Guanzhou; Tong, Jianbin; Chen, Dan; Zhou, Jin; Luo, Xuegang

    2008-07-01

    An effective and versatile method for microorganism lysis and direct extraction of DNA from bioleaching bacteria was developed using pure cultures and an acid mine drainage (AMD) sediment sample. In the described method, microorganisms are treated at three different incubation temperatures: boiling water incubation for 6-10 min, followed by 60 +/- 5 degrees C for 30 min, then 72 degrees C for 30 min. The extracted DNA is purified using a phenol/chloroform/alcohol mixture and precipitated in absolute alcohol. The 16S ribosomal RNA (rRNA) and gyrB genes of the pure cultures were amplified using the polymerase chain reaction (PCR) and differentiated using repetitive intergenic DNA sequences amplification (Rep-PCR). For the AMD sediment sample, the 16S rRNA and gyrB genes of the amplicons were digested with Hin6I and MspI, and the restriction fragment length polymorphism analysis patterns were used as a fingerprint to discern community diversity. The results indicated that this method is a versatile, reproducible, effective, and rapid technique for routine DNA extraction from bioleaching bacteria. The low cost of this method also makes it attractive for large-scale studies.

  4. Integrating Information Extraction Agents into a Tourism Recommender System

    Science.gov (United States)

    Esparcia, Sergio; Sánchez-Anguix, Víctor; Argente, Estefanía; García-Fornes, Ana; Julián, Vicente

    Recommender systems face some problems. On the one hand information needs to be maintained updated, which can result in a costly task if it is not performed automatically. On the other hand, it may be interesting to include third party services in the recommendation since they improve its quality. In this paper, we present an add-on for the Social-Net Tourism Recommender System that uses information extraction and natural language processing techniques in order to automatically extract and classify information from the Web. Its goal is to maintain the system updated and obtain information about third party services that are not offered by service providers inside the system.

  5. Aluminium and iron estimated by Mehlich-3 extractant in mine soils in Galicia, northwest Spain

    Energy Technology Data Exchange (ETDEWEB)

    Marcos, M.L.F.; Alvarez, E.; Monterroso, C. [University of Santiago, Lugo (Spain). Dept. of Edaphology

    1998-07-01

    The efficiency of Mehlich-3 reagent as an extractant for aluminium (Al) and iron (Fe) was studied in Galician coal mine soils, in the process of reclamation. Mehlich-3 Al and oxalate Al values (r+0.77) although the regression line tended to be curvilinear. Mehlich-3 and Fe values were compared to those from other Al and Fe tests and with phosphorus (P) sorption. The soils are very heterogeneous, consisting mainly of carbonaceous and non-carbonacoues clays and shales, which are often rich in pyrite. Some of them have been amended with topsoil or fly ash. A close relationship was observed between Mehlich-3 Al was better correlated than oxalate Al to pyrophosphate Al (r=0.66 vs. r=0.59) and also to pH-NaF (r=0.89 vs. r=0.74). The Mehlich-3 Al was almost as good as oxalate Al in estimating non-crystalline A.

  6. A Study on Coastline Extraction and Its Trend Based on Remote Sensing Image Data Mining

    Directory of Open Access Journals (Sweden)

    Yun Zhang

    2013-01-01

    Full Text Available In this paper, data mining theory is applied to carry out the field of the pretreatment of remote sensing images. These results show that it is an effective method for carrying out the pretreatment of low-precision remote sensing images by multisource image matching algorithm with SIFT operator, geometric correction on satellite images at scarce control points, and other techniques; the result of the coastline extracted by the edge detection method based on a chromatic aberration Canny operator has a height coincident with the actual measured result; we found that the coastline length of China is predicted to increase in the future by using the grey prediction method, with the total length reaching up to 19,471,983 m by 2015.

  7. The economic logic of persistent informality: Artisanal and small-scale mining in the Southern Philippines

    NARCIS (Netherlands)

    Verbrugge, B.L.P.

    2015-01-01

    This article critically evaluates existing causal explanations for the persistence of informality in artisanal and small-scale mining (ASM). These explanations share a legalistic focus on entry barriers and political impediments that prevent or discourage the formalization of poverty-driven ASM oper

  8. Concurrent computation of connected pattern spectra for very large image information mining

    NARCIS (Netherlands)

    Wilkinson, Michael; Moschini, Ugo; Ouzounis, G.K.; Pesaresi, M.

    2012-01-01

    This paper presents a shared-memory parallel algorithm for computing connected pattern spectra from the Max-Tree structure. The pattern spectrum is an aggregated feature space derived directly from the tree-based image representation and is a powerful tool for interactive image information mining.

  9. Towards a human eye behavior model by applying Data Mining Techniques on Gaze Information from IEC

    CERN Document Server

    Pallez, Denis; Baccino, Thierry

    2008-01-01

    In this paper, we firstly present what is Interactive Evolutionary Computation (IEC) and rapidly how we have combined this artificial intelligence technique with an eye-tracker for visual optimization. Next, in order to correctly parameterize our application, we present results from applying data mining techniques on gaze information coming from experiments conducted on about 80 human individuals.

  10. Metal extraction by Alyssum serpyllifolium ssp. lusitanicum on mine-spoil soils from Spain.

    Science.gov (United States)

    Kidd, P S; Monterroso, C

    2005-01-05

    The efficiency of Alyssum serpyllifolium ssp. lusitanicum (Brassicaceae) for use in phytoextraction of polymetallic contaminated soils was evaluated. A. serpyllifolium was grown on two mine-spoil soils (MS1 and MS2): MS1 is contaminated with Cr (283 mg kg(-1)) and MS2 is moderately contaminated with Cr (263 mg kg(-1)), Cu (264 mg kg(-1)), Pb (1433 mg kg(-1)) and Zn (377 mg kg(-1)). Soils were limed to about pH 6.0 (MS1/Ca and MS2/Ca) or limed and amended with NPK fertilisers (MS1/NPK and MS2/NPK). Biomass was reduced on MS2/Ca due to Cu phytotoxicity. Fertilisation increased biomass by 10-fold on MS1/NPK, but root growth was reduced by 7-fold compared with MS1/Ca. Plants accumulated Mn, Ni and Zn in shoots, and both metal content and transportation were generally greater in MS2 than in MS1. Zinc bioaccumulation factors (BF, shoot([metal])/soil([metal])) were significantly greater in MS2 than in MS1. However, metal yields were greatest in plants grown on MS1/NPK. Concentrations of EDTA-, NH(4)Cl- and Mehlich 3 (M3)-extractable Mn and Zn were greater after plant growth. Concentrations of M3-extractable Cr, Ni, Pb and Zn were increased at the rhizosphere. Sequential extractions showed changes in the metal distribution among different soil fractions after growth. This could reflect the buffering capacity of these soils or the plants' ability to mobilise metals from less plant-available soil pools. Results suggest that A. serpyllifolium could be suitable for phytoextraction uses in polymetallic-contaminated soils, provided Cu concentrations were not phytotoxic. However, further optimisation of growth and metal extraction are required.

  11. Possibilities of Utilization of Aggregates and Extractive Waste from Hard Coal Mining at Janina Mine in the Process of Reclamation of Open-pit Mines

    National Research Council Canada - National Science Library

    Beata Klojzy-Karczmarczyk; Janusz Mazurek; Krzysztof Paw

    2016-01-01

    In recent years, the economic importance of gangue mined during coal production has changed and it is currently treated more and more often not as waste but as a source of mineral resources for economic use...

  12. Improving information extraction using a probability-based approach

    DEFF Research Database (Denmark)

    Kim, S.; Ahmed, Saeema; Wallace, K.

    2007-01-01

    or retire. It is becoming essential to retrieve vital information from archived product documents, if it is available. There is, therefore, great interest in ways of extracting relevant and sharable information from documents. A keyword-based search is commonly used, but studies have shown...

  13. Advances in research methods for information systems research data mining, data envelopment analysis, value focused thinking

    CERN Document Server

    Osei-Bryson, Kweku-Muata

    2013-01-01

    Advances in social science research methodologies and data analytic methods are changing the way research in information systems is conducted. New developments in statistical software technologies for data mining (DM) such as regression splines or decision tree induction can be used to assist researchers in systematic post-positivist theory testing and development. Established management science techniques like data envelopment analysis (DEA), and value focused thinking (VFT) can be used in combination with traditional statistical analysis and data mining techniques to more effectively explore

  14. The study of the extraction of 3-D informations

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Min Ki [Korea Univ., Seoul (Korea); Kim, Jin Hun; Kim, Hui Yung; Lee, Gi Sik; Lee, Yung Shin [Sokyung Univ., Seoul (Korea)

    1998-04-01

    To extract three dimensional information in 3 dimensional real world two methods are applied (stereo image method, virtual reality environment method). 1. Stereo image method. From the paris of stereo image matching methods are applied to find the corresponding points in the two images. To solve the problem various methods are applied 2. Virtual reality environment method. As an alternate method to extract 3-D information, virtual reality environment is use. It is very useful to fine 6 DOF for a some given target points in 3-D space. We considered the accuracies and reliability of the 3-D informations. 34 figs., 4 tabs. (Author)

  15. Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

    Directory of Open Access Journals (Sweden)

    André SANTOS

    2012-07-01

    Full Text Available Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents.

  16. Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

    Directory of Open Access Journals (Sweden)

    Anália LOURENÇO

    2013-07-01

    Full Text Available Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents.

  17. EXTRACT

    DEFF Research Database (Denmark)

    Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra

    2016-01-01

    The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have the...... and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15-25% and helps curators to detect terms that would otherwise have been missed.Database URL: https://extract.hcmr.gr/....

  18. The enhanced mine communications and information systems. The development of the Nexsys realtime risk management system

    Energy Technology Data Exchange (ETDEWEB)

    Haustein, K.; Rowan, G. [CSIRO Exploration and Mining (Australia)

    2007-03-15

    The article describes two safety projects under way between JCOAL in Japan and CSIRO (Australia) which are concluding in March 2007. The first was to develop a real-time roof fall monitoring and warning system for underground coal mines. The system consisted of extensometers, stress meters and a seismic monitoring system. It was installed at the Ulan colliery in New South Wales. The output of the system is a set of probabilities of a roof fall happening within various periods of time. The three instruments have colour-coded warning lights. The second project, the enhanced mine communications and information systems for real-time risk analysis project, collects and analyses data from diverse sources with the Nexsys{trademark} hardware and software system. It is now installed in two mines in Australia and one in Japan. The system is described in detail in the article. 2 refs., 6 figs.

  19. Visualization and Integrated Data Mining of Disparate Information

    Energy Technology Data Exchange (ETDEWEB)

    Saffer, Jeffrey D.(OMNIVIZ, INC); Albright, Cory L.(BATTELLE (PACIFIC NW LAB)); Calapristi, Augustin J.(BATTELLE (PACIFIC NW LAB)); Chen, Guang (OMNIVIZ, INC); Crow, Vernon L.(BATTELLE (PACIFIC NW LAB)); Decker, Scott D.(BATTELLE (PACIFIC NW LAB)); Groch, Kevin M.(BATTELLE (PACIFIC NW LAB)); Havre, Susan L.(BATTELLE (PACIFIC NW LAB)); Malard, Joel (BATTELLE (PACIFIC NW LAB)); Martin, Tonya J.(BATTELLE (PACIFIC NW LAB)); Miller, Nancy E.(BATTELLE (PACIFIC NW LAB)); Monroe, Philip J.(OMNIVIZ, INC); Nowell, Lucy T.(BATTELLE (PACIFIC NW LAB)); Payne, Deborah A.(BATTELLE (PACIFIC NW LAB)); Reyes Spindola, Jorge F.(BATTELLE (PACIFIC NW LAB)); Scarberry, Randall E.(OMNIVIZ, INC); Sofia, Heidi J.(BATTELLE (PACIFIC NW LAB)); Stillwell, Lisa C.(OMNIVIZ, INC); Thomas, Gregory S.(BATTELLE (PACIFIC NW LAB)); Thurston, Sarah J.(OMNIVIZ, INC); Williams, Leigh K.(BATTELLE (PACIFIC NW LAB)); Zabriskie, Sean J.(OMNIVIZ, INC); MG Hicks

    2001-05-11

    The volumes and diversity of information in the discovery, development, and business processes within the chemical and life sciences industries require new approaches for analysis. Traditional list- or spreadsheet-based methods are easily overwhelmed by large amounts of data. Furthermore, generating strong hypotheses and, just as importantly, ruling out weak ones, requires integration across different experimental and informational sources. We have developed a framework for this integration, including common conceptual data models for multiple data types and linked visualizations that provide an overview of the entire data set, a measure of how each data record is related to every other record, and an assessment of the associations within the data set.

  20. Personal computer program and spreadsheet for calculating the coal mine roof rating (CMRR). Information circular/1994

    Energy Technology Data Exchange (ETDEWEB)

    Riefenberg, J.; Wuest, W.J.

    1994-01-01

    A family of personal computer programs that calculate the Coal Mine Roof Rating (CMRR) have been developed by the U.S. Bureau of Mines. The CMRR, rock mass classification system, was recently developed by Bureau researchers to provide a link between the qualitative geologists' description of coal mine roof and the quantitive mine engineers' needs for mine design, roof support selection, and hazard detection. The program CMRR, is a user-friendly, interactive program into which raw field data are input, and a CMRR is calculated and output along with two graphic displays. The first graphic display is a plan view map with the roof ratings displayed on a color-coded scale, and the second display shows a stratigraphic section of the bolted roof interval and its resultant roof rating. In addition, a Lotus 1-2-3 worksheet, BOM-CMRR.WK3, has been developed for easy storage of field data. The worksheet also includes macros developed for calculation and storage of the CMRR. Production of summary reports for analysis of site-specific information are readily generated using Lotus. These programs help to facilitate the engineer in utilizing the CMRR in ground control studies.

  1. Information and communication technology and climate change adaptation: Evidence from selected mining companies in South Africa

    Directory of Open Access Journals (Sweden)

    Bartholomew I. Aleke

    2016-03-01

    Full Text Available The mining sector is a significant contributor to the gross domestic product of many global economies. Given the increasing trends in climate-induced disasters and the growing desire to find lasting solutions, information and communication technology (ICT has been introduced into the climate change adaptation mix. Climate change-induced extreme weather events such as flooding, drought, excessive fog, and cyclones have compounded the environmental challenges faced by the mining sector. This article presents the adoption of ICT innovation as part of the adaptation strategies towards reducing the mining sector’s vulnerability and exposure to climate change disaster risks. Document analysis and systematic literature review were adopted as the methodology. Findings from the study reflect how ICT intervention orchestrated changes in communication patterns which are tailored towards the reduction in climate change vulnerability and exposure. The research concludes with a proposition that ICT intervention must be part of the bigger and ongoing climate change adaptation agenda in the mining sector.Keywords: ICT; climate change; disaster risk reduction; mining; adaptation; South Africa

  2. Data Mining Framework for Generating Sales Decision Making Information Using Association Rules

    OpenAIRE

    Md. Humayun Kabir

    2016-01-01

    The rapid technological development in the field of information and communication technology (ICT) has enabled the databases of super shops to be organized under a countrywide sales decision making network to develop intelligent business systems by generating enriched business policies. This paper presents a data mining framework for generating sales decision making information from sales data using association rules generated from valid user input item set with respect to the sales data unde...

  3. Text mining of web-based medical content

    CERN Document Server

    Neustein, Amy

    2014-01-01

    Text Mining of Web-Based Medical Content examines web mining for extracting useful information that can be used for treating and monitoring the healthcare of patients. This work provides methodological approaches to designing mapping tools that exploit data found in social media postings. Specific linguistic features of medical postings are analyzed vis-a-vis available data extraction tools for culling useful information.

  4. Extraction of Coupling Information From $Z' \\to jj$

    OpenAIRE

    Rizzo, T. G.

    1993-01-01

    An analysis by the ATLAS Collaboration has recently shown, contrary to popular belief, that a combination of strategic cuts, excellent mass resolution, and detailed knowledge of the QCD backgrounds from direct measurements can be used to extract a signal in the $Z' \\to jj$ channel in excess of $6\\sigma$ for certain classes of extended electroweak models. We explore the possibility that the data extracted from $Z$ dijet peak will have sufficient statistical power as to supply information on th...

  5. Web Crime Mining by Means of Data Mining Techniques

    Directory of Open Access Journals (Sweden)

    Javad Hosseinkhani

    2014-03-01

    Full Text Available The purpose of this study is to provide a review to mining useful information by means of Data Mining. The procedure of extracting knowledge and information from large set of data is data mining that applying artificial intelligence method to find unseen relationships of data. There is more study on data mining applications that attracted more researcher attention and one of the crucial field is criminology that applying in data mining which is utilized for identifying crime characteristics. Detecting and exploring crimes and investigating their relationship with criminals are involved in the analyzing crime process. Criminology is a suitable field for using data mining techniques that shows the high volume and the complexity of relationships between crime datasets. Therefore, for further analysis development, the identifying crime characteristic will be the first step and obtained knowledge from data mining approaches is a very useful tool to help and support police forces. This research aims to provide a review to extract useful information by means of Data Mining, in order to find crime hot spots out and predict crime trends for them using crime data mining techniques.

  6. PDF text classification to leverage information extraction from publication reports.

    Science.gov (United States)

    Bui, Duy Duc An; Del Fiol, Guilherme; Jonnalagadda, Siddhartha

    2016-06-01

    Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (pPDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. PaperBLAST: Text Mining Papers for Information about Homologs.

    Science.gov (United States)

    Price, Morgan N; Arkin, Adam P

    2017-01-01

    Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST's database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/. IMPORTANCE With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins' functions.

  8. Data-Throughput Enhancement Using Data Mining-Informed Cognitive Radio

    Directory of Open Access Journals (Sweden)

    Khashayar Kotobi

    2015-03-01

    Full Text Available We propose the data mining-informed cognitive radio, which uses non-traditional data sources and data-mining techniques for decision making and improving the performance of a wireless network. To date, the application of information other than wireless channel data in cognitive radios has not been significantly studied. We use a novel dataset (Twitter traffic as an indicator of network load in a wireless channel. Using this dataset, we present and test a series of predictive algorithms that show an improvement in wireless channel utilization over traditional collision-detection algorithms. Our results demonstrate the viability of using these novel datasets to inform and create more efficient cognitive radio networks.

  9. Local mine production safety supervision game analysis based on incomplete information

    Institute of Scientific and Technical Information of China (English)

    LI Xing-dong; LI Ying; REN Da-wei; LIU Zhao-xia

    2007-01-01

    Utilized fundamental theory and analysis method of Incomplete Information repeated games, introduced Incomplete Information into repeated games, and established two stages dynamic games model of the local authority and the coal mine owner. The analytic result indicates that: so long as the country established the corresponding rewards and punishments incentive mechanism to the local authority departments responsible for the work, it reports the safety accident in the coal mine on time. The conclusion that the local government displays right and wrong cooperation behavior will be changed with the introduction of the Incomplete Information. Only has the local authority fulfill their responsibility, can the unsafe accident be controlled effectively. Once this kind of cooperation of local government appears, the costs of the country on the safe supervise and the difficulty will be able to decrease greatly.

  10. Preprocessing and Morphological Analysis in Text Mining

    Directory of Open Access Journals (Sweden)

    Krishna Kumar Mohbey Sachin Tiwari

    2011-12-01

    Full Text Available This paper is based on the preprocessing activities which is performed by the software or language translators before applying mining algorithms on the huge data. Text mining is an important area of Data mining and it plays a vital role for extracting useful information from the huge database or data ware house. But before applying the text mining or information extraction process, preprocessing is must because the given data or dataset have the noisy, incomplete, inconsistent, dirty and unformatted data. In this paper we try to collect the necessary requirements for preprocessing. When we complete the preprocess task then we can easily extract the knowledgful information using mining strategy. This paper also provides the information about the analysis of data like tokenization, stemming and semantic analysis like phrase recognition and parsing. This paper also collect the procedures for preprocessing data i.e. it describe that how the stemming, tokenization or parsing are applied.

  11. Ontology-Based Information Extraction for Business Intelligence

    Science.gov (United States)

    Saggion, Horacio; Funk, Adam; Maynard, Diana; Bontcheva, Kalina

    Business Intelligence (BI) requires the acquisition and aggregation of key pieces of knowledge from multiple sources in order to provide valuable information to customers or feed statistical BI models and tools. The massive amount of information available to business analysts makes information extraction and other natural language processing tools key enablers for the acquisition and use of that semantic information. We describe the application of ontology-based extraction and merging in the context of a practical e-business application for the EU MUSING Project where the goal is to gather international company intelligence and country/region information. The results of our experiments so far are very promising and we are now in the process of building a complete end-to-end solution.

  12. Measures for the prevention of mine waste by reducing the prostrate cut in extraction operations with cutting extraction; Massnahmen zur Bergevermeidung durch Reduzierung des Liegendschnitts in Abbaubetrieben mit schneidender Gewinnung

    Energy Technology Data Exchange (ETDEWEB)

    Kroker, Jueergen; Telsemeyer, Thomas [Bergwerk Auguste Victoria, RAG Deutsche Steinkohle AG, Marl (Germany); Rosinski, Dirk [Bergwerk Lippe, Gelsenkirchen (Germany)

    2009-11-05

    Within the scope of an investigation into the mines of RAG over a period of consideration from 1{sup st} January 2006 to 30 June 2007 it was established that approximately 25% of all accumulated mine waste is caused by prostrate additional cut in the extraction operations. A prostrate additional cut is required in the extraction operations for system-related reasons, in which the passage height of the extraction facilities is not guaranteed as a result of the loosening of the coal seam. The additional extraction of the footwall rock, which exceeds this subsequently defined extent, causes the preventable mine waste. Using the example of the extraction operation with cutting extraction in the Auguste Victoria and Lippe collieries it is shown which technical and organisational measures applying the Eicontrol system have successfully contributed to the reduction of the prostrate additional cut with the objective of preventing mine waste. (orig.)

  13. Longwall mining

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1995-03-14

    As part of EIA`s program to provide information on coal, this report, Longwall-Mining, describes longwall mining and compares it with other underground mining methods. Using data from EIA and private sector surveys, the report describes major changes in the geologic, technological, and operating characteristics of longwall mining over the past decade. Most important, the report shows how these changes led to dramatic improvements in longwall mining productivity. For readers interested in the history of longwall mining and greater detail on recent developments affecting longwall mining, the report includes a bibliography.

  14. Community perspectives of natural resource extraction: coal-seam gas mining and social identity in Eastern Australia

    Directory of Open Access Journals (Sweden)

    David Lloyd

    2013-01-01

    Full Text Available Using a recent case study of community reaction to proposed coal-seam gas mining in eastern Australia, we illustrate the role of community views in issues of natural resource use. Drawing on interviews, observations and workshops, the paper explores the anti-coal-seam gas social movement from its stages of infancy through to being a national debate linking community groups across and beyond Australia. Primary community concerns of inadequate community consultation translate into fears regarding potential impacts on farmland and cumulative impacts on aquifers and future water supply, and questions regarding economic, social and environmental benefits. Many of the community activists had not previously been involved in such social action. A recurring message from affected communities is concern around perceived insufficient research and legislation for such rapid industrial expansion. A common citizen demand is the cessation of the industry until there is better understanding of underground water system interconnectivity and the methane extraction and processing life cycle. Improved scientific knowledge of the industry and its potential impacts will, in the popular view, enable better comparison of power generation efficiency with coal and renewable energy sources and better comprehension of the industry as a transition energy industry. It will also enable elected representatives and policy makers to make more informed decisions while developing appropriate legislation to ensure a sustainable future.

  15. KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

    Directory of Open Access Journals (Sweden)

    Schomburg Dietmar

    2010-07-01

    Full Text Available Abstract Background The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed. Description Here we present a text mining algorithm for the extraction of kinetic information such as KM, Ki, kcat etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (KM, Ki, kcat, kcat/KM, Vmax, IC50, S0.5, Kd, Ka, t1/2, pI, nH, specific activity, Vmax/KM from about 17 million PubMed abstracts and combine them with other data in the abstract. A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched. The results were stored in a database and are available as "KID the KInetic Database" via the internet. Conclusions The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases. The database is available at http://kid.tu-bs.de. The source code of the algorithm is provided under the GNU General Public

  16. Security information processing system in Mitsui Miike Coal Mine

    Energy Technology Data Exchange (ETDEWEB)

    Ikeda, Tsutomu; Makino, Hideyuki (Mitsui Coal Mining Co., Ltd., Tokyo (Japan))

    1988-10-25

    A comment says that the security system failed to properly function to combine and reckon the information reported and the data detected by the sensor at the time of past disasters. An integrated data processing system which incorporates the independent function to predict an accident and is capable of offering a guideline to take the lead in evacuation and to prevent the spread of disaster has been developed. It is capable of obviating any disaster while directing the operation of various equipment in the pit and the fluctuation of environmental conditions in the pit including the gas concentration and temperature. Furthermore, it works to prevent the spread of disaster at the time of emergency. The major features of the system are that the system is designed to ignore the fluctuation of CO condensation at the time of normal blasting and to quickly and accurately catch any premonitory fluctuation when it detects any abnormal fluctuation. It combines the plural numbers of data indicating the possible accident and arrange them in order. Then it displays the data warning the workers on the CRT. When the machines replace manual labor, the problem is that the machines cannot make systematic response, which is the most advantageous point of the manual supervision. Development of a software capable of making systematic judgement as a man does is hope for in the future. 4 figures, 2 tables.

  17. Extracting an entanglement signature from only classical mutual information

    Energy Technology Data Exchange (ETDEWEB)

    Starling, David J.; Howell, John C. [Department of Physics and Astronomy, University of Rochester, Rochester, New York 14627 (United States); Broadbent, Curtis J. [Department of Physics and Astronomy, University of Rochester, Rochester, New York 14627 (United States); Rochester Theory Center, University of Rochester, Rochester, New York 14627 (United States)

    2011-09-15

    We introduce a quantity which is formed using classical notions of mutual information and which is computed using the results of projective measurements. This quantity constitutes a sufficient condition for entanglement and represents the amount of information that can be extracted from a bipartite system for spacelike separated observers. In addition to discussion, we provide simulations as well as experimental results for the singlet and maximally correlated mixed states.

  18. Extracting clinical information to support medical decision based on standards.

    Science.gov (United States)

    Gomoi, Valentin; Vida, Mihaela; Stoicu-Tivadar, Lăcrămioara; Stoicu-Tivadar, Vasile

    2011-01-01

    The paper presents a method connecting medical databases to a medical decision system, and describes a service created to extract the necessary information that is transferred based on standards. The medical decision can be improved based on many inputs from different medical locations. The developed solution is described for a concrete case concerning the management for chronic pelvic pain, based on the information retrieved from diverse healthcare databases.

  19. Extracting an entanglement signature from only classical mutual information

    Science.gov (United States)

    Starling, David J.; Broadbent, Curtis J.; Howell, John C.

    2011-09-01

    We introduce a quantity which is formed using classical notions of mutual information and which is computed using the results of projective measurements. This quantity constitutes a sufficient condition for entanglement and represents the amount of information that can be extracted from a bipartite system for spacelike separated observers. In addition to discussion, we provide simulations as well as experimental results for the singlet and maximally correlated mixed states.

  20. THE METHODS OF EXTRACTING WATER INFORMATION FROM SPOT IMAGE

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    Some techniques and methods for deriving water information from SPOT -4 (XI) image were investigatedand discussed in this paper. An algorithm of decision-tree (DT) classification which includes several classifiers based onthe spectral responding characteristics of water bodies and other objects, was developed and put forward to delineate wa-ter bodies. Another algorithm of decision-tree classification based on both spectral characteristics and auxiliary informa-tion of DEM and slope (DTDS) was also designed for water bodies extraction. In addition, supervised classificationmethod of maximum-likelyhood classification (MLC), and unsupervised method of interactive self-organizing dada analy-sis technique (ISODATA) were used to extract waterbodies for comparison purpose. An index was designed and used toassess the accuracy of different methods adopted in the research. Results have shown that water extraction accuracy wasvariable with respect to the various techniques applied. It was low using ISODATA, very high using DT algorithm andmuch higher using both DTDS and MLC.

  1. Metal speciation of historic and new copper mine tailings from Repparfjorden, Northern Norway, before and after acid, base and electrodialytic extraction

    DEFF Research Database (Denmark)

    Pedersen, Kristine B.; Jensen, Pernille Erland; Ottosen, Lisbeth M.

    2017-01-01

    tailings. Substantial desorption (>40%) for both historic and new mine tailings occurred at pH values below 3 and above 12. These results combined with metal speciation, showing that the binding of Cu in the sediment changes around pH values 3 and 10, indicate potential for extraction of more Cu from...... the new mine tailings. Electrodialysis, based on applying an electric field of low intensity to extract metals from polluted soils/sediments, was designed for acidic and alkaline extraction, and in both cases more Cu was extracted than in the pure acid/base extractions, while maintaining low mobilisation...

  2. Tumor information extraction in radiology reports for hepatocellular carcinoma patients

    Science.gov (United States)

    Yim, Wen-wai; Denman, Tyler; Kwan, Sharon W.; Yetisgen, Meliha

    2016-01-01

    Hepatocellular carcinoma (HCC) is a deadly disease affecting the liver for which there are many available therapies. Targeting treatments towards specific patient groups necessitates defining patients by stage of disease. Criteria for such stagings include information on tumor number, size, and anatomic location, typically only found in narrative clinical text in the electronic medical record (EMR). Natural language processing (NLP) offers an automatic and scale-able means to extract this information, which can further evidence-based research. In this paper, we created a corpus of 101 radiology reports annotated for tumor information. Afterwards we applied machine learning algorithms to extract tumor information. Our inter-annotator partial match agreement scored at 0.93 and 0.90 F1 for entities and relations, respectively. Based on the annotated corpus, our sequential labeling entity extraction achieved 0.87 F1 partial match, and our maximum entropy classification relation extraction achieved scores 0.89 and 0. 74 F1 with gold and system entities, respectively. PMID:27570686

  3. Information Extraction and Linking in a Retrieval Context

    NARCIS (Netherlands)

    Moens, M.F.; Hiemstra, Djoerd

    We witness a growing interest and capabilities of automatic content recognition (often referred to as information extraction) in various media sources that identify entities (e.g. persons, locations and products) and their semantic attributes (e.g., opinions expressed towards persons or products,

  4. Spatiotemporal Information Extraction from a Historic Expedition Gazetteer

    Directory of Open Access Journals (Sweden)

    Mafkereseb Kassahun Bekele

    2016-11-01

    Full Text Available Historic expeditions are events that are flavored by exploratory, scientific, military or geographic characteristics. Such events are often documented in literature, journey notes or personal diaries. A typical historic expedition involves multiple site visits and their descriptions contain spatiotemporal and attributive contexts. Expeditions involve movements in space that can be represented by triplet features (location, time and description. However, such features are implicit and innate parts of textual documents. Extracting the geospatial information from these documents requires understanding the contextualized entities in the text. To this end, we developed a semi-automated framework that has multiple Information Retrieval and Natural Language Processing components to extract the spatiotemporal information from a two-volume historic expedition gazetteer. Our framework has three basic components, namely, the Text Preprocessor, the Gazetteer Processing Machine and the JAPE (Java Annotation Pattern Engine Transducer. We used the Brazilian Ornithological Gazetteer as an experimental dataset and extracted the spatial and temporal entities from entries that refer to three expeditioners’ site visits (which took place between 1910 and 1926 and mapped the trajectory of each expedition using the extracted information. Finally, one of the mapped trajectories was manually compared with a historical reference map of that expedition to assess the reliability of our framework.

  5. Study of Cu and Pb partitioning in mine tailings using the Tessier sequential extraction scheme

    Science.gov (United States)

    Andrei, Mariana Lucia; Senila, Marin; Hoaghia, Maria Alexandra; Borodi, Gheorghe; Levei, Erika-Andrea

    2015-12-01

    The Cu and Pb partitioning in nonferrous mine tailings was investigated using the Tessier sequential extraction scheme. The contents of Cu and Pb found in the five operationally defined fractions were determined by inductively coupled plasma optical emission spectrometry. The results showed different partitioning patterns for Cu and Pb in the studied tailings. The total Cu and Pb contents were higher in tailings from Brazesti than in those from Saliste, while the Cu contents in the first two fractions considered as mobile were comparable and the content of mobile Pb was the highest in Brazesti tailings. In the tailings from Saliste about 30% of Cu and 3% of Pb were found in exchangeable fraction, while in those from Brazesti no metals were found in the exchangeable fraction, but the percent of Cu and Pb found in the bound to carbonate fraction were high (20% and 26%, respectively). The highest Pb content was found in the residual fraction in Saliste tailings and in bound to Fe and Mn oxides fraction in Brazesti tailings, while the highest Cu content was found in the fraction bound to organic matter in Saliste tailings and in the residual fraction in Brazesti tailings. In case of tailings of Brazesti medium environmental risk was found both for Pb and Cu, while in case of Saliste tailings low risk for Pb and high risk for Cu were found.

  6. Preparatory information for third molar extraction: does preference for information and behavioral involvement matter?

    NARCIS (Netherlands)

    van Wijk, A.J.; Buchanan, H.; Coulson, N.; Hoogstraten, J.

    2010-01-01

    Objective: The objectives of the present study were to: (1) evaluate the impact of high versus low information provision in terms of anxiety towards third molar extraction (TME) as well as satisfaction with information provision. (2) Investigate how preference for information and behavioral

  7. Homeland situation awareness through mining and fusing heterogeneous information from intelligence databases and field sensors

    Science.gov (United States)

    Digioia, Giusj; Panzieri, Stefano

    2012-06-01

    One of the most felt issues in the defence domain is that of having huge quantities of data stored in databases and acquired from field sensors, without being able to infer information from them. Usually databases are continuously updated with observations, and are related to heterogeneous data. Deep and continuous analysis on data could mine useful correlations, explain relations existing among data and cue searches for further evidences. The solution to the problem addressed before seems to deal both with the domain of Data Mining and with the domain of high level Data Fusion, that is Situation Assessment, Threat Assessment and Process Refinement, also synthesised as Situation Awareness. The focus of this paper is the definition of an architecture for a system adopting data mining techniques to adaptively discover clusters of information and relation among them, to classify observations acquired and to use the model of knowledge and the classification derived in order to assess situations, threats and refine the search for evidences. Sources of information taken into account are those related to the intelligence domain, as IMINT, HUMINT, ELINT, COMINT and other non-conventional sources. The algorithms applied refer to not supervised and supervised classification for rule exploitation, and adaptively built Hidden Markov Model for situation and threat assessment.

  8. Imprinted magnetic graphene oxide for the mini-solid phase extraction of Eu (III) from coal mine area

    Science.gov (United States)

    Patra, Santanu; Roy, Ekta; Madhuri, Rashmi; Sharma, Prashant K.

    2017-05-01

    The present work represents the preparation of imprinted magnetic reduced graphene oxide and applied it for the selective removal of Eu (III) from local coal mines area. A simple solid phase extraction method was used for this purpose. The material shows a very high adsorption as well as removal efficiency towards Eu (III), which suggest that the material have potential to be used in future for their real time applications in removal of Eu (III) from complex matrices.

  9. Extraction and haulage technology in the Namibian diamond mining industry; Gewinnungs- und Foerdertechnik im Diamantbergbau Namibias

    Energy Technology Data Exchange (ETDEWEB)

    Mischo, Helmut [Namibia' s Univ. of Science and Technology, Windhuk (NM). Lehrstuhl fuer Betriebsmittel und bergbauliche Verfahrenslehre

    2011-01-15

    Fore more than 100 years, diamonds have been mined in Namibia in the Sperrgebiet and along the coast. With the depletion of the on-shore deposits, all mining activities will soon move to coastal and off-shore operations, which are of high importance already today. The paper provides a detailed overview over the applied mining and conveying technology in the different on- and off-shore deposits. (orig.)

  10. Extending a geocoding database by Web information extraction

    Science.gov (United States)

    Wu, Yunchao; Niu, Zheng

    2008-10-01

    Local Search has recently attracted much attention. And the popular architecture of Local Search is map-and-hyperlinks, which links geo-referenced Web content to a map interface. This architecture shows that a good Local Search not only depends on search engine techniques, but also on a perfect geocoding database. The process of building and updating a geocoding database is laborious and time consuming so that it is usually difficult to keep up with the change of the real world. However, the Web provides a rich resource of location related information, which would be a supplementary information source for geocoding. Therefore, this paper introduces how to extract geographic information from Web documents to extend a geocoding database. Our approach involves two major steps. First, geographic named entities are identified and extracted from Web content. Then, named entities are geocoded and put into storage. By this way, we can extend a geocoding database to provide better local Web search services.

  11. The Study on Information Extraction Technology of Seismic Damage

    Directory of Open Access Journals (Sweden)

    Huang Zuo-wei

    2013-01-01

    Full Text Available In order to improve the information extraction technology of seismic damage assessment and information publishing of earthquake damage. Based on past earthquake experience it was constructed technical flow of earthquake damage assessment rapidly, this study, take Yushu earthquake as example, studies the framework and establishment of the information service system by means of Arc IMS and distributed database technology. It analysis some key technologies, build web publishing architecture of massive remote sensing images. The system implements joint application of remote sensing image processing technology, database technology and Web GIS technology, the result could provide the important basis for earthquake damage assessment, emergency management and rescue mission.

  12. Extraction of Information from Images using Dewrapping Techniques

    Directory of Open Access Journals (Sweden)

    Khalid Nazim S. A.

    2010-11-01

    Full Text Available An image containing textual information is called a document image. The textual information in document images is useful in areas like vehicle number plate reading, passport reading and cargo container reading and so on. Thus extracting useful textual information in the document image plays an important role in many applications. One of the major challenges in camera document analysis is to deal with the wrap and perspective distortions. In spite of the prevalence of dewrapping techniques, there is no standard efficient algorithm for the performance evaluation that concentrates on visualization. Wrapping is a common appearance document image before recognition. In order to capture the document images a mobile camera of 2megapixel resolution is used. A database is developed with variations in background, size and colour along with wrapped images, blurred and clean images. This database will be explored and text extraction from those document images is performed. In case of wrapped images no efficient dewrapping techniques have been implemented till date. Thus extracting the text from the wrapped images is done by maintaining a suitable template database. Further, the extracted text from the wrapped or other document images will be converted into an editable form such as Notepad or MS word document. The experimental results were corroborated on various objects of database.

  13. Rapid automatic keyword extraction for information retrieval and analysis

    Science.gov (United States)

    Rose, Stuart J [Richland, WA; Cowley,; E, Wendy [Richland, WA; Crow, Vernon L [Richland, WA; Cramer, Nicholas O [Richland, WA

    2012-03-06

    Methods and systems for rapid automatic keyword extraction for information retrieval and analysis. Embodiments can include parsing words in an individual document by delimiters, stop words, or both in order to identify candidate keywords. Word scores for each word within the candidate keywords are then calculated based on a function of co-occurrence degree, co-occurrence frequency, or both. Based on a function of the word scores for words within the candidate keyword, a keyword score is calculated for each of the candidate keywords. A portion of the candidate keywords are then extracted as keywords based, at least in part, on the candidate keywords having the highest keyword scores.

  14. Extracting Semantic Information from Visual Data: A Survey

    Directory of Open Access Journals (Sweden)

    Qiang Liu

    2016-03-01

    Full Text Available The traditional environment maps built by mobile robots include both metric ones and topological ones. These maps are navigation-oriented and not adequate for service robots to interact with or serve human users who normally rely on the conceptual knowledge or semantic contents of the environment. Therefore, the construction of semantic maps becomes necessary for building an effective human-robot interface for service robots. This paper reviews recent research and development in the field of visual-based semantic mapping. The main focus is placed on how to extract semantic information from visual data in terms of feature extraction, object/place recognition and semantic representation methods.

  15. Abstract Information Extraction From Consumer's Comments On Internet Media

    Directory of Open Access Journals (Sweden)

    Kadriye Ergün

    2013-01-01

    Full Text Available In this study, a system developed to summarize by automatically evaluating comments about product with using text mining techniques will be described. The data has been primarily went through morphological analysis process, because they are texts written in natural language. Words and adjectives meaning positive or negative are determined. They show product features in texts. The tree structure is established according to Turkish grammar rules as subordinate and modified words are designated. The software which uses the depth-first search algorithm on the tree structure is developed. Data from result of software is stored in the SQL database. When any inquiry is made from these data depending on any property of product, numerical information which indicates the degree of satisfaction about this property is obtained.

  16. Information extraction approaches to unconventional data sources for "Injury Surveillance System": the case of newspapers clippings.

    Science.gov (United States)

    Berchialla, Paola; Scarinzi, Cecilia; Snidero, Silvia; Rahim, Yousif; Gregori, Dario

    2012-04-01

    Injury Surveillance Systems based on traditional hospital records or clinical data have the advantage of being a well established, highly reliable source of information for making an active surveillance on specific injuries, like choking in children. However, they suffer the drawback of delays in making data available to the analysis, due to inefficiencies in data collection procedures. In this sense, the integration of clinical based registries with unconventional data sources like newspaper articles has the advantage of making the system more useful for early alerting. Usage of such sources is difficult since information is only available in the form of free natural-language documents rather than structured databases as required by traditional data mining techniques. Information Extraction (IE) addresses the problem of transforming a corpus of textual documents into a more structured database. In this paper, on a corpora of Italian newspapers articles related to choking in children due to ingestion/inhalation of foreign body we compared the performance of three IE algorithms- (a) a classical rule based system which requires a manual annotation of the rules; (ii) a rule based system which allows for the automatic building of rules; (b) a machine learning method based on Support Vector Machine. Although some useful indications are extracted from the newspaper clippings, this approach is at the time far from being routinely implemented for injury surveillance purposes.

  17. Report: Congressionally Requested Information on the Status and Length of Review for Appalachian Surface Mining Permit Applications

    Science.gov (United States)

    Report #12-P-0083, November 21, 2011. After reconciling discrepancies and vetting information, we identified 185 surface mining permit applications to review from the list of 237 that we received from the senator.

  18. Advanced applications of natural language processing for performing information extraction

    CERN Document Server

    Rodrigues, Mário

    2015-01-01

    This book explains how can be created information extraction (IE) applications that are able to tap the vast amount of relevant information available in natural language sources: Internet pages, official documents such as laws and regulations, books and newspapers, and social web. Readers are introduced to the problem of IE and its current challenges and limitations, supported with examples. The book discusses the need to fill the gap between documents, data, and people, and provides a broad overview of the technology supporting IE. The authors present a generic architecture for developing systems that are able to learn how to extract relevant information from natural language documents, and illustrate how to implement working systems using state-of-the-art and freely available software tools. The book also discusses concrete applications illustrating IE uses.   ·         Provides an overview of state-of-the-art technology in information extraction (IE), discussing achievements and limitations for t...

  19. Robust Vehicle and Traffic Information Extraction for Highway Surveillance

    Directory of Open Access Journals (Sweden)

    C.-C. Jay Kuo

    2005-08-01

    Full Text Available A robust vision-based traffic monitoring system for vehicle and traffic information extraction is developed in this research. It is challenging to maintain detection robustness at all time for a highway surveillance system. There are three major problems in detecting and tracking a vehicle: (1 the moving cast shadow effect, (2 the occlusion effect, and (3 nighttime detection. For moving cast shadow elimination, a 2D joint vehicle-shadow model is employed. For occlusion detection, a multiple-camera system is used to detect occlusion so as to extract the exact location of each vehicle. For vehicle nighttime detection, a rear-view monitoring technique is proposed to maintain tracking and detection accuracy. Furthermore, we propose a method to improve the accuracy of background extraction, which usually serves as the first step in any vehicle detection processing. Experimental results are given to demonstrate that the proposed techniques are effective and efficient for vision-based highway surveillance.

  20. Mining Association Rules in Big Data for E-healthcare Information System

    Directory of Open Access Journals (Sweden)

    N. Rajkumar

    2014-08-01

    Full Text Available Big data related to large volume, multiple ways of growing data sets and autonomous sources. Now the big data is quickly enlarged in many advanced domains, because of rapid growth in networking and data collection. The study is defining the E-Healthcare Information System, which needs to make logical and structural method of approaching the knowledge. And also effectually preparing and controlling the data generated during the diagnosis activities of medical application through sharing information among E-Healthcare Information System devices. The main objective is, A E-Healthcare Information System which is extensive, integrated knowledge system designed to control all the views of a hospital operation, such as medical data’s, administrative, financial, legal information’s and the corresponding service processing. At last the analysis of result will be generated using Association Mining Techniques which processed from big data of hospital information datasets. Finally mining techniques result could be evaluated in terms of accuracy, precision, recall and positive rate.

  1. Reference Information Extraction and Processing Using Random Conditional Fields

    Directory of Open Access Journals (Sweden)

    Tudor Groza

    2012-06-01

    Full Text Available Fostering both the creation and the linking of data with the scope of supporting the growth of the Linked Data Web requires us to improve the acquisition and extraction mechanisms of the underlying semantic metadata. This is particularly important for the scientific publishing domain, where currently most of the datasets are being created in an author-driven, manual manner. In addition, such datasets capture only fragments of the complete metadata, omitting usually, important elements such as the references, although they represent valuable information. In this paper we present an approach that aims at dealing with this aspect of extraction and processing of reference information. The experimental evaluation shows that, currently, our solution handles very well diverse types of reference format, thus making it usable for, or adaptable to, any area of scientific publishing.

  2. Information extraction from the GER 63-channel spectrometer data

    Science.gov (United States)

    Kiang, Richard K.

    1993-09-01

    The unprecedented data volume in the era of NASA's Mission to Planet Earth (MTPE) demands innovative information extraction methods and advanced processing techniques. The neural network techniques, which are intrinsic to distributed parallel processings and have shown promising results in analyzing remotely sensed data, could become the essential tools in the MTPE era. To evaluate the information content of data with higher dimension and the usefulness of neural networks in analyzing them, measurements from the GER 63-channel airborne imaging spectrometer data over Cuprite, Nevada, are used. The data are classified with 3-layer Perceptron of various architectures. It is shown that the neural network can achieve a level of performance similar to conventional methods, without the need for an explicit feature extraction step.

  3. Patterns and security technologies for co-extraction of coal and gas in deep mines without entry pillars

    Institute of Scientific and Technical Information of China (English)

    Nong Zhang; Fei Xue; Nianchao Zhang; Xiaowei Feng

    2015-01-01

    Retaining gob-side entryways and the stability of gas drainage boreholes are two essential techniques in the co-extraction of coal and gas without entry pillars (CECGWEP). However, retained entryways located in deep coal mines are hard to maintain, especially for constructing boreholes in confined spaces, owing to major deformations. Consequently, it is difficult to drill boreholes and maintain their stability, which therefore cannot guarantee the effectiveness of gas drainage. This paper presents three measures for conducting CECGWEP in deep mines on the basis of effective space in retained entryways for gas drainage. They are combinations of retaining roadways and face-lagging inclined boreholes, retaining roadways and face-advancing inclined boreholes, and retaining roadways and high return airway inclined boreholes. Several essential techniques are suggested to improve the maintenance of retained entryways and the stabilization of boreholes. For the particular cases considered in this study, two field trials have verified the latter two measures from the results obtained from the faces 1111(1) and 11112(1) in the Zhuji Mine. The results indicate that these models can effectively solve the problems in deep mines. The maximum gas drainage flow for a single hole can reach 8.1 m3/min and the effective drainage distance can be extended up to 150 m or more.

  4. Extracting Firm Information from Administrative Records: The ASSD Firm Panel

    OpenAIRE

    Fink, Martina; Segalla, Esther; Weber, Andrea; Zulehner, Christine

    2010-01-01

    This paper demonstrates how firm information can be extracted from administrative social security records. We use the Austrian Social Security Database (ASSD) and derive firms from employer identifiers in the universe of private sector workers. To correctly pin down entry end exits we use a worker flow approach which follows clusters of workers as they move across administrative entities. This procedure enables us to define different types of entry and exit such as start-ups, spinoffs, closur...

  5. OCR++: A Robust Framework For Information Extraction from Scholarly Articles

    OpenAIRE

    Singh, Mayank; Barua, Barnopriyo; Palod, Priyank; Garg, Manvi; Satapathy, Sidhartha; Bushi, Samuel; Ayush, Kumar; Rohith, Krishna Sai; Gamidi, Tulasi; Goyal, Pawan; Mukherjee, Animesh

    2016-01-01

    This paper proposes OCR++, an open-source framework designed for a variety of information extraction tasks from scholarly articles including metadata (title, author names, affiliation and e-mail), structure (section headings and body text, table and figure headings, URLs and footnotes) and bibliography (citation instances and references). We analyze a diverse set of scientific articles written in English language to understand generic writing patterns and formulate rules to develop this hybri...

  6. A new method for precursory information extraction: Slope-difference information method

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    A new method for precursory information extraction, i.e.,slope-difference information method is proposed in the paper for the daily-mean-value precursory data sequence. Taking Tangshan station as an example, the calculation of full-time-domain leveling data is made, which is tested and compared with several other methods. The results indicate that the method is very effective for extracting short-term precursory information from the daily mean values after the optimization is made. Therefore, it is valuable for popularization and application.

  7. Extraction of hidden information by efficient community detection in networks

    CERN Document Server

    Lee, Juyong; Lee, Jooyoung

    2012-01-01

    Currently, we are overwhelmed by a deluge of experimental data, and network physics has the potential to become an invaluable method to increase our understanding of large interacting datasets. However, this potential is often unrealized for two reasons: uncovering the hidden community structure of a network, known as community detection, is difficult, and further, even if one has an idea of this community structure, it is not a priori obvious how to efficiently use this information. Here, to address both of these issues, we, first, identify optimal community structure of given networks in terms of modularity by utilizing a recently introduced community detection method. Second, we develop an approach to use this community information to extract hidden information from a network. When applied to a protein-protein interaction network, the proposed method outperforms current state-of-the-art methods that use only the local information of a network. The method is generally applicable to networks from many areas.

  8. A Semantic Approach for Geospatial Information Extraction from Unstructured Documents

    Science.gov (United States)

    Sallaberry, Christian; Gaio, Mauro; Lesbegueries, Julien; Loustau, Pierre

    Local cultural heritage document collections are characterized by their content, which is strongly attached to a territory and its land history (i.e., geographical references). Our contribution aims at making the content retrieval process more efficient whenever a query includes geographic criteria. We propose a core model for a formal representation of geographic information. It takes into account characteristics of different modes of expression, such as written language, captures of drawings, maps, photographs, etc. We have developed a prototype that fully implements geographic information extraction (IE) and geographic information retrieval (IR) processes. All PIV prototype processing resources are designed as Web Services. We propose a geographic IE process based on semantic treatment as a supplement to classical IE approaches. We implement geographic IR by using intersection computing algorithms that seek out any intersection between formal geocoded representations of geographic information in a user query and similar representations in document collection indexes.

  9. Options for compiling an inventory of mining waste sites throughout Europe by combining Landsat-TM derived information with national and pan-European thematic data sets

    Science.gov (United States)

    Vijdea, Anca-Marina; Sommer, Stefan

    2004-10-01

    Presently no reliable synoptic picture of number, extent, distribution and emissions from mining waste sites exists, neither for EU member states, nor for the Accession and Candidate Countries. At EU level, this information is needed to assess the large range of environmental impacts caused by mining wastes and their emissions in a coherent way across the different policies addressing the protection and sustainable use of environmental resources. The core task lies in the harmonised collection and standardised compilation and evaluation of existing data and in connecting them to a geographical reference system compatible with other European data sets. In the proposed approach information from national registers of mining wastes is linked to related standardized spatial data layers such as CORINE Land Cover (the classes of mineral extraction sites, dump sites) or other data sets available in the EUROSTAT GISCO data base, thus adding the spatial dimension at regional scale. Higher level of spatial detail and distinction between mineral extraction site and waste sites with or without accumulation of potentially hazardous material is added by remote sensing, applying a semi-automated principal component analysis (PCA) to selected spectral channels of geo-referenced Landsat-TM full scenes. The method was demonstrated on large areas covering approximately 120000 km2 of Slovakia and Romania and was validated against mining-related features from Pan-European and/or national databases, detailed geological maps, mineral resource maps, as well as by a GIS analysis showing the distribution of anomalous pixels in the above-mentioned features compared to the main land cover classes.

  10. Performance mining equipment (extraction-load-transportation) in the Ernesto Guevara Factory

    National Research Council Canada - National Science Library

    Orlando Belete Fuentes; Severo Estenoz-Mejía; Yoandro Diéguez-García

    2016-01-01

      The general efficiency of the mining work of the outburst fronts in the locations of the Company Ernesto Guevara of Moa is below the productivities established stockings of exploitation for each one...

  11. 30 CFR 942.779 - Surface mining permit applications-Minimum requirements for information on environmental resources.

    Science.gov (United States)

    2010-07-01

    ... requirements for information on environmental resources. 942.779 Section 942.779 Mineral Resources OFFICE OF... requirements for information on environmental resources. Part 779 of this chapter, Surface Mining Permit Applications—Minimum Requirements for Information on Environmental Resources, shall apply to any person...

  12. 30 CFR 941.779 - Surface mining permit applications-minimum requirements for information on environmental resources.

    Science.gov (United States)

    2010-07-01

    ... requirements for information on environmental resources. 941.779 Section 941.779 Mineral Resources OFFICE OF... requirements for information on environmental resources. Part 779 of this chapter, Surface Mining Permit Applications—Minimum Requirements for Information on Environmental Resources, shall apply to any person...

  13. 30 CFR 910.779 - Surface mining permit applications-minimum requirements for information on environmental resources.

    Science.gov (United States)

    2010-07-01

    ... requirements for information on environmental resources. 910.779 Section 910.779 Mineral Resources OFFICE OF... requirements for information on environmental resources. Part 779 of this chapter, Surface Mining Permit Applications—Minimum Requirements for Information on Environmental Resources, shall apply to any person...

  14. 30 CFR 905.779 - Surface mining permit applications-Minimum requirements for information on environmental resources.

    Science.gov (United States)

    2010-07-01

    ... requirements for information on environmental resources. 905.779 Section 905.779 Mineral Resources OFFICE OF... requirements for information on environmental resources. (a) Part 779 of this chapter, Surface Mining Permit Applications—Minimum Requirements for Information on Environmental Resources, shall apply to any person...

  15. 30 CFR 937.779 - Surface mining permit applications-minimum requirements for information on environmental resources.

    Science.gov (United States)

    2010-07-01

    ... requirements for information on environmental resources. 937.779 Section 937.779 Mineral Resources OFFICE OF... requirements for information on environmental resources. Part 779 of this chapter, Surface Mining Permit Applications—Minimum Requirements for Information on Environmental Resources, shall apply to any person...

  16. 30 CFR 922.779 - Surface mining permit applications-minimum requirements for information on environmental resources.

    Science.gov (United States)

    2010-07-01

    ... requirements for information on environmental resources. 922.779 Section 922.779 Mineral Resources OFFICE OF... requirements for information on environmental resources. Part 779 of this chapter, Surface Mining Permit Applications—Minimum Requirements for Information on Environmental Resources, shall apply to any person...

  17. 30 CFR 912.779 - Surface mining permit applications-minimum requirements for information on environmental resources.

    Science.gov (United States)

    2010-07-01

    ... requirements for information on environmental resources. 912.779 Section 912.779 Mineral Resources OFFICE OF... requirements for information on environmental resources. Part 779 of this chapter, Surface Mining Permit Applications—Minimum Requirements for Information on Environmental Resources, shall apply to any person...

  18. 30 CFR 933.779 - Surface mining permit applications-minimum requirements for information on environmental resources.

    Science.gov (United States)

    2010-07-01

    ... requirements for information on environmental resources. 933.779 Section 933.779 Mineral Resources OFFICE OF... requirements for information on environmental resources. Part 779 of this chapter, Surface Mining Permit Applications—Minimum Requirements for Information on Environmental Resources, shall apply to any person...

  19. 30 CFR 903.779 - Surface mining permit applications-Minimum requirements for information on environmental resources.

    Science.gov (United States)

    2010-07-01

    ... requirements for information on environmental resources. 903.779 Section 903.779 Mineral Resources OFFICE OF... requirements for information on environmental resources. (a) Part 779 of this chapter, Surface Mining Permit Applications—Minimum Requirements for Information on Environmental Resources, applies to any person who...

  20. 30 CFR 921.779 - Surface mining permit applications-minimum requirements for information on environmental resources.

    Science.gov (United States)

    2010-07-01

    ... requirements for information on environmental resources. 921.779 Section 921.779 Mineral Resources OFFICE OF... requirements for information on environmental resources. Part 779 of this chapter, Surface Mining Permit Applications—Minimum Requirements for Information on Environmental Resources, shall apply to any person...

  1. 30 CFR 939.779 - Surface mining permit applications-minimum requirements for information on environmental resources.

    Science.gov (United States)

    2010-07-01

    ... requirements for information on environmental resources. 939.779 Section 939.779 Mineral Resources OFFICE OF... requirements for information on environmental resources. Part 779 of this chapter, Surface Mining Permit Applications—Minimum Requirements for Information on Environmental Resources, shall apply to any person...

  2. 30 CFR 947.779 - Surface mining permit applications-minimum requirements for information on environmental resources.

    Science.gov (United States)

    2010-07-01

    ... requirements for information on environmental resources. 947.779 Section 947.779 Mineral Resources OFFICE OF... requirements for information on environmental resources. Part 779 of this chapter, Surface Mining Permit Applications—Minimum Requirements for Information on Environmental Resources, shall apply to any person...

  3. [Retrieval of Copper Pollution Information from Hyperspectral Satellite Data in a Vegetation Cover Mining Area].

    Science.gov (United States)

    Qu, Yong-hua; Jiao, Si-hong; Liu, Su-hong; Zhu, Ye-qing

    2015-11-01

    Heavy metal mining activities have caused the complex influence on the ecological environment of the mining regions. For example, a large amount of acidic waste water containing heavy metal ions have be produced in the process of copper mining which can bring serious pollution to the ecological environment of the region. In the previous research work, bare soil is mainly taken as the research target when monitoring environmental pollution, and thus the effects of land surface vegetation have been ignored. It is well known that vegetation condition is one of the most important indictors to reflect the ecological change in a certain region and there is a significant linkage between the vegetation spectral characteristics and the heavy metal when the vegetation is effected by the heavy metal pollution. It means the vegetation is sensitive to heavy metal pollution by their physiological behaviors in response to the physiological ecology change of their growing environment. The conventional methods, which often rely on large amounts of field survey data and laboratorial chemical analysis, are time consuming and costing a lot of material resources. The spectrum analysis method using remote sensing technology can acquire the information of the heavy mental content in the vegetation without touching it. However, the retrieval of that information from the hyperspectral data is not an easy job due to the difficulty in figuring out the specific band, which is sensitive to the specific heavy metal, from a huge number of hyperspectral bands. Thus the selection of the sensitive band is the key of the spectrum analysis method. This paper proposed a statistical analysis method to find the feature band sensitive to heavy metal ion from the hyperspectral data and to then retrieve the metal content using the field survey data and the hyperspectral images from China Environment Satellite HJ-1. This method selected copper ion content in the leaves as the indicator of copper pollution

  4. Disposal and improvement of contaminated by waste extraction of copper mining in chile

    Science.gov (United States)

    Naranjo Lamilla, Pedro; Blanco Fernández, David; Díaz González, Marcos; Robles Castillo, Marcelo; Decinti Weiss, Alejandra; Tapia Alvarez, Carolina; Pardo Fabregat, Francisco; Vidal, Manuel Miguel Jordan; Bech, Jaume; Roca, Nuria

    2016-04-01

    This project originated from the need of a mining company, which mines and processes copper ore. High purity copper is produced with an annual production of 1,113,928 tons of concentrate to a law of 32%. This mining company has generated several illegal landfills and has been forced by the government to make a management center Industrial Solid Waste (ISW). The forecast volume of waste generated is 20,000 tons / year. Chemical analysis established that the studied soil has a high copper content, caused by nature or from the spread of contaminants from mining activities. Moreover, in some sectors, soil contamination by mercury, hydrocarbons and oils and fats were detected, likely associated with the accumulation of waste. The waters are also impacted by mining industrial tasks, specifically copper ores, molybdenum, manganese, sulfates and have an acidic pH. The ISW management center dispels the pollution of soil and water and concentrating all activities in a technically suitable place. In this center the necessary guidelines for the treatment and disposal of soil contamination caused by uncontrolled landfills are given, also generating a leachate collection system and a network of fluid monitoring physicochemical water quality and soil environment. Keywords: Industrial solid waste, soil contamination, Mining waste

  5. THE METHODS OF EXTRACTING WATER INFORMATION FROM SPOT IMAGE

    Institute of Scientific and Technical Information of China (English)

    DUJin-kang; FENGXue-zhi; 等

    2002-01-01

    Some techniques and methods for deriving water information from SPOT-4(XI) image were investigated and discussed in this paper.An algorithmoif decision-tree(DT) classification which includes several classifiers based on the spectral responding characteristics of water bodies and other objects,was developed and put forward to delineate water bodies.Another algorithm of decision-tree classification based on both spectral characteristics and auxiliary information of DEM and slope(DTDS) was also designed for water bodies extraction.In addition,supervised classification method of maximum-likelyhood classification(MLC),and unsupervised method of interactive self -organizing dada analysis technique(ISODATA) were used to extract waterbodies for comparison purpose.An index was designed and used to assess the accuracy of different methods abopted in the research.Results have shown that water extraction accuracy was variable with respect to the various techniques applied.It was low using ISODATA,very high using DT algorithm and much higher using both DTDS and MLC.

  6. Extraction of spatial information for low-bandwidth telerehabilitation applications

    Directory of Open Access Journals (Sweden)

    Kok Kiong Tan, PhD

    2014-09-01

    Full Text Available Telemedicine applications, based on two-dimensional (2D video conferencing technology, have been around for the past 15 to 20 yr. They have been demonstrated to be acceptable for face-to-face consultations and useful for visual examination of wounds and abrasions. However, certain telerehabilitation assessments need the use of spatial information in order to accurately assess the patient’s condition and sending three-dimensional video data over low-bandwidth networks is extremely challenging. This article proposes an innovative way of extracting the key spatial information from the patient’s movement during telerehabilitation assessment based on 2D video and then presenting the extracted data by using graph plots alongside the video to help physicians in assessments with minimum burden on existing video data transfer. Some common rehabilitation scenarios are chosen for illustrations, and experiments are conducted based on skeletal tracking and color detection algorithms using the Microsoft Kinect sensor. Extracted data are analyzed in detail and their usability discussed.

  7. Transliteration normalization for Information Extraction and Machine Translation

    Directory of Open Access Journals (Sweden)

    Yuval Marton

    2014-12-01

    Full Text Available Foreign name transliterations typically include multiple spelling variants. These variants cause data sparseness and inconsistency problems, increase the Out-of-Vocabulary (OOV rate, and present challenges for Machine Translation, Information Extraction and other natural language processing (NLP tasks. This work aims to identify and cluster name spelling variants using a Statistical Machine Translation method: word alignment. The variants are identified by being aligned to the same “pivot” name in another language (the source-language in Machine Translation settings. Based on word-to-word translation and transliteration probabilities, as well as the string edit distance metric, names with similar spellings in the target language are clustered and then normalized to a canonical form. With this approach, tens of thousands of high-precision name transliteration spelling variants are extracted from sentence-aligned bilingual corpora in Arabic and English (in both languages. When these normalized name spelling variants are applied to Information Extraction tasks, improvements over strong baseline systems are observed. When applied to Machine Translation tasks, a large improvement potential is shown.

  8. 30 CFR 75.1200-1 - Additional information on mine map.

    Science.gov (United States)

    2010-07-01

    ... SAFETY AND HEALTH MANDATORY SAFETY STANDARDS-UNDERGROUND COAL MINES Maps § 75.1200-1 Additional... symbols; (g) The location of railroad tracks and public highways leading to the mine, and mine buildings... permanent base line points coordinated with the underground and surface mine traverses, and the location and...

  9. [Study on Information Extraction of Clinic Expert Information from Hospital Portals].

    Science.gov (United States)

    Zhang, Yuanpeng; Dong, Jiancheng; Qian, Danmin; Geng, Xingyun; Wu, Huiqun; Wang, Li

    2015-12-01

    Clinic expert information provides important references for residents in need of hospital care. Usually, such information is hidden in the deep web and cannot be directly indexed by search engines. To extract clinic expert information from the deep web, the first challenge is to make a judgment on forms. This paper proposes a novel method based on a domain model, which is a tree structure constructed by the attributes of search interfaces. With this model, search interfaces can be classified to a domain and filled in with domain keywords. Another challenge is to extract information from the returned web pages indexed by search interfaces. To filter the noise information on a web page, a block importance model is proposed. The experiment results indicated that the domain model yielded a precision 10.83% higher than that of the rule-based method, whereas the block importance model yielded an F₁ measure 10.5% higher than that of the XPath method.

  10. PRODUCTS FEATURE EXTRACTION BASED ON OPINION MINING%基于观点挖掘的产品特征提取

    Institute of Scientific and Technical Information of China (English)

    刘羽; 曹瑞娟

    2014-01-01

    To explore products feature can help the manufacturers and service providers to make targeted improvement on products’ performance and to guide users to comprehensively understand every function of the products.Through analysing the product information and comment information,we build a three-layer mining model to achieve opinion mining-based products feature extraction,and the layer 3 mining model innovatively uses the method of combining the association rules with the dependence analysis.Using Web crawler technology,we gain the experimental data from Internet.Experimental results prove the effectiveness of the model and the method.%探索产品特征,以帮助生产商和服务商有针对性地改进产品性能和指导用户对产品的各个性能有比较全面的了解。通过分析产品信息和评论信息,搭建3层挖掘模型实现基于观点挖掘的产品特征提取,第3层挖掘模型创新性地使用关联规则和依存分析相结合的方法。采用网络爬虫技术从互联网获取实验数据,实验结果证明该模型和该方法的有效性。

  11. Extraction of Coupling Information From $Z' \\to jj$

    CERN Document Server

    Rizzo, T G

    1993-01-01

    An analysis by the ATLAS Collaboration has recently shown, contrary to popular belief, that a combination of strategic cuts, excellent mass resolution, and detailed knowledge of the QCD backgrounds from direct measurements can be used to extract a signal in the $Z' \\to jj$ channel in excess of $6\\sigma$ for certain classes of extended electroweak models. We explore the possibility that the data extracted from $Z$ dijet peak will have sufficient statistical power as to supply information on the couplings of the $Z'$ provided it is used in conjunction with complimentary results from the $Z' \\to \\ell^+ \\ell^-$ `discovery' channel. We show, for a 1 TeV $Z'$ produced at the SSC, that this technique can provide a powerful new tool with which to identify the origin of $Z'$'s.

  12. Extraction of coupling information from Z'-->jj

    Science.gov (United States)

    Rizzo, Thomas G.

    1993-11-01

    An analysis by the ATLAS Collaboration has recently shown, contrary to popular belief, that a combination of strategic cuts, excellent mass resolution, and detailed knowledge of the QCD backgrounds from direct measurements can be used to extract a signal in the Z'-->jj channel for certain classes of extended electroweak models. We explore the possibility that the data extracted from Z dijet peak will have sufficient statistical power as to supply information on the couplings of the Z' provided it is used in conjunction with complementary results from the Z'-->l+l- ``discovery'' channel. We show, for a 1 TeV Z' produced at the SSC, that this technique can provide a powerful new tool with which to identify the origin of Z'. Extensions of this analysis to the CERN LHC as well as for a more massive Z' are discussed.

  13. Extracting Backbones from Weighted Complex Networks with Incomplete Information

    Directory of Open Access Journals (Sweden)

    Liqiang Qian

    2015-01-01

    Full Text Available The backbone is the natural abstraction of a complex network, which can help people understand a networked system in a more simplified form. Traditional backbone extraction methods tend to include many outliers into the backbone. What is more, they often suffer from the computational inefficiency—the exhaustive search of all nodes or edges is often prohibitively expensive. In this paper, we propose a backbone extraction heuristic with incomplete information (BEHwII to find the backbone in a complex weighted network. First, a strict filtering rule is carefully designed to determine edges to be preserved or discarded. Second, we present a local search model to examine part of edges in an iterative way, which only relies on the local/incomplete knowledge rather than the global view of the network. Experimental results on four real-life networks demonstrate the advantage of BEHwII over the classic disparity filter method by either effectiveness or efficiency validity.

  14. Text mining for pharmacovigilance: Using machine learning for drug name recognition and drug-drug interaction extraction and classification.

    Science.gov (United States)

    Ben Abacha, Asma; Chowdhury, Md Faisal Mahbub; Karanasiou, Aikaterini; Mrabet, Yassine; Lavelli, Alberto; Zweigenbaum, Pierre

    2015-12-01

    Pharmacovigilance (PV) is defined by the World Health Organization as the science and activities related to the detection, assessment, understanding and prevention of adverse effects or any other drug-related problem. An essential aspect in PV is to acquire knowledge about Drug-Drug Interactions (DDIs). The shared tasks on DDI-Extraction organized in 2011 and 2013 have pointed out the importance of this issue and provided benchmarks for: Drug Name Recognition, DDI extraction and DDI classification. In this paper, we present our text mining systems for these tasks and evaluate their results on the DDI-Extraction benchmarks. Our systems rely on machine learning techniques using both feature-based and kernel-based methods. The obtained results for drug name recognition are encouraging. For DDI-Extraction, our hybrid system combining a feature-based method and a kernel-based method was ranked second in the DDI-Extraction-2011 challenge, and our two-step system for DDI detection and classification was ranked first in the DDI-Extraction-2013 task at SemEval. We discuss our methods and results and give pointers to future work.

  15. Audio enabled information extraction system for cricket and hockey domains

    CERN Document Server

    Saraswathi, S; B., Sai Vamsi Krishna; S, Suresh Reddy

    2010-01-01

    The proposed system aims at the retrieval of the summarized information from the documents collected from web based search engine as per the user query related to cricket and hockey domain. The system is designed in a manner that it takes the voice commands as keywords for search. The parts of speech in the query are extracted using the natural language extractor for English. Based on the keywords the search is categorized into 2 types: - 1.Concept wise - information retrieved to the query is retrieved based on the keywords and the concept words related to it. The retrieved information is summarized using the probabilistic approach and weighted means algorithm.2.Keyword search - extracts the result relevant to the query from the highly ranked document retrieved from the search by the search engine. The relevant search results are retrieved and then keywords are used for summarizing part. During summarization it follows the weighted and probabilistic approaches in order to identify the data comparable to the k...

  16. KneeTex: an ontology-driven system for information extraction from MRI reports.

    Science.gov (United States)

    Spasić, Irena; Zhao, Bo; Jones, Christopher B; Button, Kate

    2015-01-01

    In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain. As an ontology-driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain-specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico-semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co-reference resolution, followed by text segmentation. Ontology-based semantic typing is then used to drive the template filling process. We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine-grained lexico-semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated

  17. GeoIRIS: Geospatial Information Retrieval and Indexing System-Content Mining, Semantics Modeling, and Complex Queries.

    Science.gov (United States)

    Shyu, Chi-Ren; Klaric, Matt; Scott, Grant J; Barb, Adrian S; Davis, Curt H; Palaniappan, Kannappan

    2007-04-01

    Searching for relevant knowledge across heterogeneous geospatial databases requires an extensive knowledge of the semantic meaning of images, a keen eye for visual patterns, and efficient strategies for collecting and analyzing data with minimal human intervention. In this paper, we present our recently developed content-based multimodal Geospatial Information Retrieval and Indexing System (GeoIRIS) which includes automatic feature extraction, visual content mining from large-scale image databases, and high-dimensional database indexing for fast retrieval. Using these underpinnings, we have developed techniques for complex queries that merge information from heterogeneous geospatial databases, retrievals of objects based on shape and visual characteristics, analysis of multiobject relationships for the retrieval of objects in specific spatial configurations, and semantic models to link low-level image features with high-level visual descriptors. GeoIRIS brings this diverse set of technologies together into a coherent system with an aim of allowing image analysts to more rapidly identify relevant imagery. GeoIRIS is able to answer analysts' questions in seconds, such as "given a query image, show me database satellite images that have similar objects and spatial relationship that are within a certain radius of a landmark."

  18. An Enhanced Text-Mining Framework for Extracting Disaster Relevant Data through Social Media and Remote Sensing Data Fusion

    Science.gov (United States)

    Scheele, C. J.; Huang, Q.

    2016-12-01

    In the past decade, the rise in social media has led to the development of a vast number of social media services and applications. Disaster management represents one of such applications leveraging massive data generated for event detection, response, and recovery. In order to find disaster relevant social media data, current approaches utilize natural language processing (NLP) methods based on keywords, or machine learning algorithms relying on text only. However, these approaches cannot be perfectly accurate due to the variability and uncertainty in language used on social media. To improve current methods, the enhanced text-mining framework is proposed to incorporate location information from social media and authoritative remote sensing datasets for detecting disaster relevant social media posts, which are determined by assessing the textual content using common text mining methods and how the post relates spatiotemporally to the disaster event. To assess the framework, geo-tagged Tweets were collected for three different spatial and temporal disaster events: hurricane, flood, and tornado. Remote sensing data and products for each event were then collected using RealEarthTM. Both Naive Bayes and Logistic Regression classifiers were used to compare the accuracy within the enhanced text-mining framework. Finally, the accuracies from the enhanced text-mining framework were compared to the current text-only methods for each of the case study disaster events. The results from this study address the need for more authoritative data when using social media in disaster management applications.

  19. Using XBRL Technology to Extract Competitive Information from Financial Statements

    Directory of Open Access Journals (Sweden)

    Dominik Ditter

    2011-12-01

    Full Text Available The eXtensible Business Reporting Language, or XBRL, is a reporting format for the automatic and electronic exchange of business and financial data. In XBRL every single reported fact is marked with a unique tag, enabling a full computer-based readout of financial data. It has the potential to improve the collection and analysis of financial data for Competitive Intelligence (e.g., the profiling of publicly available financial statements. The article describes how easily information from XBRL reports can be extracted.

  20. A High Accuracy Method for Semi-supervised Information Extraction

    Energy Technology Data Exchange (ETDEWEB)

    Tratz, Stephen C.; Sanfilippo, Antonio P.

    2007-04-22

    Customization to specific domains of dis-course and/or user requirements is one of the greatest challenges for today’s Information Extraction (IE) systems. While demonstrably effective, both rule-based and supervised machine learning approaches to IE customization pose too high a burden on the user. Semi-supervised learning approaches may in principle offer a more resource effective solution but are still insufficiently accurate to grant realistic application. We demonstrate that this limitation can be overcome by integrating fully-supervised learning techniques within a semi-supervised IE approach, without increasing resource requirements.

  1. A Novel Visual Data Mining Module for the Geographical Information System gvSIG

    Directory of Open Access Journals (Sweden)

    Romel Vázquez-Rodríguez

    2013-01-01

    Full Text Available The exploration of large GIS models containing spatio-temporal information is a challenge. In this paper we propose the integration of scientific visualization (ScVis techniques into geographic information systems (GIS as an alternative for the visual analysis of data. Providing GIS with such tools improves the analysis and understanding of datasets with very low spatial density and allows to find correlations between variables in time and space. In this regard, we present a new visual data mining tool for the GIS gvSIG. This tool has been implemented as a gvSIG module and contains several ScVis techniques for multiparameter data with a wide range of possibilities to explore interactively the data. The developed module is a powerful visual data mining and data visualization tool to obtain knowledge from multiple datasets in time and space. A real case study with meteorological data from Villa Clara province (Cuba is presented, where the implemented visualization techniques were used to analyze the available datasets. Although it is tested with meteorological data, the developed module is of general application in the sense that it can be used in multiple application fields related with Earth Sciences.

  2. Applied data mining for business and industry

    CERN Document Server

    Giudici, Paolo

    2009-01-01

    The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications. Introduces data mining methods and applications.Covers classical and Bayesian multivariate statistical methodology as well as machine learning and computational data mining methods.Includes many recent developments such as association and sequence rules, graphical Markov models, lifetime value modelling, credit risk, operational risk and web mining.Features detailed case studies based on applied projects within industry.Incorporates discussion of data mining software, with case studies a...

  3. Mining (except Oil and Gas) Sector (NAICS 212)

    Science.gov (United States)

    EPA Regulatory and enforcement information for the mining sector, including metal mining & nonmetallic mineral mining and quarrying. Includes information about asbestos, coal mining, mountaintop mining, Clean Water Act section 404, and abandoned mine lands

  4. Automated information and control complex of hydro-gas endogenous mine processes

    Science.gov (United States)

    Davkaev, K. S.; Lyakhovets, M. V.; Gulevich, T. M.; Zolin, K. A.

    2017-09-01

    The automated information and control complex designed to prevent accidents, related to aerological situation in the underground workings, accounting of the received and handed over individual devices, transmission and display of measurement data, and the formation of preemptive solutions is considered. Examples for the automated workplace of an airgas control operator by individual means are given. The statistical characteristics of field data characterizing the aerological situation in the mine are obtained. The conducted studies of statistical characteristics confirm the feasibility of creating a subsystem of controlled gas distribution with an adaptive arrangement of points for gas control. The adaptive (multivariant) algorithm for processing measuring information of continuous multidimensional quantities and influencing factors has been developed.

  5. Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial

    DEFF Research Database (Denmark)

    Debortoli, Stefan; Müller, Oliver; Junglas, Iris

    2016-01-01

    t is estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video);and much of it is expressed in rich and ambiguous natural language. Traditionally, the analysis of natural languagehas prompted the use of qualitative data analysis approaches......, such as manual coding. Yet, the size of text data setsobtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challengesencountered when applying automated text-mining techniques in information systems research. In particular, weshowcase the use of probabilistic...... topic modeling via Latent Dirichlet Allocation, an unsupervised text miningtechnique, in combination with a LASSO multinomial logistic regression to explain user satisfaction with an IT artifactby automatically analyzing more than 12,000 online customer reviews. For fellow information systems...

  6. Extraction of Profile Information from Cloud Contaminated Radiances. Appendixes 2

    Science.gov (United States)

    Smith, W. L.; Zhou, D. K.; Huang, H.-L.; Li, Jun; Liu, X.; Larar, A. M.

    2003-01-01

    Clouds act to reduce the signal level and may produce noise dependence on the complexity of the cloud properties and the manner in which they are treated in the profile retrieval process. There are essentially three ways to extract profile information from cloud contaminated radiances: (1) cloud-clearing using spatially adjacent cloud contaminated radiance measurements, (2) retrieval based upon the assumption of opaque cloud conditions, and (3) retrieval or radiance assimilation using a physically correct cloud radiative transfer model which accounts for the absorption and scattering of the radiance observed. Cloud clearing extracts the radiance arising from the clear air portion of partly clouded fields of view permitting soundings to the surface or the assimilation of radiances as in the clear field of view case. However, the accuracy of the clear air radiance signal depends upon the cloud height and optical property uniformity across the two fields of view used in the cloud clearing process. The assumption of opaque clouds within the field of view permits relatively accurate profiles to be retrieved down to near cloud top levels, the accuracy near the cloud top level being dependent upon the actual microphysical properties of the cloud. The use of a physically correct cloud radiative transfer model enables accurate retrievals down to cloud top levels and below semi-transparent cloud layers (e.g., cirrus). It should also be possible to assimilate cloudy radiances directly into the model given a physically correct cloud radiative transfer model using geometric and microphysical cloud parameters retrieved from the radiance spectra as initial cloud variables in the radiance assimilation process. This presentation reviews the above three ways to extract profile information from cloud contaminated radiances. NPOESS Airborne Sounder Testbed-Interferometer radiance spectra and Aqua satellite AIRS radiance spectra are used to illustrate how cloudy radiances can be used

  7. Karst rocky desertification information extraction with EO-1 Hyperion data

    Science.gov (United States)

    Yue, Yuemin; Wang, Kelin; Zhang, Bing; Jiao, Quanjun; Yu, Yizun

    2008-12-01

    Karst rocky desertification is a special kind of land desertification developed under violent human impacts on the vulnerable eco-geo-environment of karst ecosystem. The process of karst rocky desertification results in simultaneous and complex variations of many interrelated soil, rock and vegetation biogeophysical parameters, rendering it difficult to develop simple and robust remote sensing mapping and monitoring approaches. In this study, we aimed to use Earth Observing 1 (EO-1) Hyperion hyperspectral data to extract the karst rocky desertification information. A spectral unmixing model based on Monte Carlo approach, was employed to quantify the fractional cover of photosynthetic vegetation (PV), non-photosynthetic vegetation (NPV) and bare substrates. The results showed that SWIR (1.9-2.35μm) portions of the spectrum were significantly different in PV, NPV and bare rock spectral properties. It has limitations in using full optical range or only SWIR (1.9-2.35μm) region of Hyperion to decompose image into PV, NPV and bare substrates covers. However, when use the tied-SWIR, the sub-pixel fractional covers of PV, NPV and bare substrates were accurately estimated. Our study indicates that the "tied-spectrum" method effectively accentuate the spectral characteristics of materials, while the spectral unmixing model based on Monte Carlo approach is a useful tool to automatically extract mixed ground objects in karst ecosystem. Karst rocky desertification information can be accurately extracted with EO-1 Hyperion. Imaging spectroscopy can provide a powerful methodology toward understanding the extent and spatial pattern of land degradation in karst ecosystem.

  8. Automated extraction of chemical structure information from digital raster images

    Directory of Open Access Journals (Sweden)

    Shedden Kerby A

    2009-02-01

    Full Text Available Abstract Background To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. Results This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. Conclusion The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links

  9. Big Data Mining: Tools & Algorithms

    Directory of Open Access Journals (Sweden)

    Adeel Shiraz Hashmi

    2016-03-01

    Full Text Available We are now in Big Data era, and there is a growing demand for tools which can process and analyze it. Big data analytics deals with extracting valuable information from that complex data which can’t be handled by traditional data mining tools. This paper surveys the available tools which can handle large volumes of data as well as evolving data streams. The data mining tools and algorithms which can handle big data have also been summarized, and one of the tools has been used for mining of large datasets using distributed algorithms.

  10. Extraction of hidden information by efficient community detection in networks

    Science.gov (United States)

    Lee, Jooyoung; Lee, Juyong; Gross, Steven

    2013-03-01

    Currently, we are overwhelmed by a deluge of experimental data, and network physics has the potential to become an invaluable method to increase our understanding of large interacting datasets. However, this potential is often unrealized for two reasons: uncovering the hidden community structure of a network, known as community detection, is difficult, and further, even if one has an idea of this community structure, it is not a priori obvious how to efficiently use this information. Here, to address both of these issues, we, first, identify optimal community structure of given networks in terms of modularity by utilizing a recently introduced community detection method. Second, we develop an approach to use this community information to extract hidden information from a network. When applied to a protein-protein interaction network, the proposed method outperforms current state-of-the-art methods that use only the local information of a network. The method is generally applicable to networks from many areas. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 20120001222).

  11. In vitro physiologically based extraction test (PBET) and bioaccessibility of arsenic and lead from various mine waste materials.

    Science.gov (United States)

    Bruce, Scott; Noller, Barry; Matanitobua, Vitukawalu; Ng, Jack

    2007-10-01

    In vivo models show that the bioavailability of soil contaminants varies between site and type of matrix. Studies demonstrated that assuming 100% bioavailability of arsenic (As) and lead (Pb) from soils and mine waste materials overestimates the risk associated with human exposure. In in vitro systems, the simulated bioavailability of a contaminant is referred to as the "bioaccessibility" and is used as an alternative quantitative indicator for in vivo derived bioavailability estimates. The general concept of the in vitro extraction test is to predict the bioavailability of inorganic substances from solid matrices by simulating the gastrointestinal tract (GIT) environment. The aims of this study were to: (1) investigate the bioaccessibility of As and Pb from various mine wastes, including tailings, heap leach, and waste rock, using a physiologically based extraction test (PBET); (2) validate the bioaccessibility values from PBET with in vivo bioavailability values measured using animal models; and (3) correlate PBET results with the bioavailability values measured from alternative in vivo models (rats and cattle, from Bruce, 2004). Significant correlation was observed between bioaccessibility values from PBET, and bioavailability values generated for both rats and cattle, demonstrating the potential to utilize PBET as a relatively inexpensive alternative to in vivo models for bioavailability assessment.

  12. Using Data Mining Techniques in Customer Segmentation

    Directory of Open Access Journals (Sweden)

    Hasan Ziafat

    2014-09-01

    Full Text Available Data mining plays important role in marketing and is quite new. Although this field expands rapidly, data mining is still foreign issue for many marketers who trust only their experiences. Data mining techniques cannot substitute the significant role of domain experts and their business knowledge. In the other words, data mining algorithms are powerful but cannot effectively work without the active support of business experts. We can gain useful results by combining these techniques and business expertise. For instance ability of a data mining technique can be substantially increased by combining person experience in the field or information of business can be integrated into a data mining model to build a more successful result. Moreover, these results should always be evaluated by business experts. Thus, business knowledge can help and enrich the data mining results. On the other hand, data mining techniques can extract patterns that even the most experienced business people may have missed. In conclusion, the combination of business domain expertise with the power of data mining techniques can help organizations gain a competitive advantage in their efforts to optimize customer management. Clustering algorithms, a group of data mining technique, is one of most common used way to segment data set according to their similarities. This paper focuses on the topic of customer segmentation using data mining techniques. In the other words, we theoretically discuss about customer relationship management and then utilize couple of data mining algorithm specially clustering techniques for customer segmentation. We concentrated on behavioral segmentation.

  13. Unsupervised Learning of mDTD Extraction Patterns for Web Text Mining.

    Science.gov (United States)

    Kim, Dongseok; Jung, Hanmin; Lee, Gary Geunbae

    2003-01-01

    Presents a new extraction pattern, modified Document Type Definition (mDTD), which relies on analytical interpretation to identify extraction target from the contents of Web documents. Experiments with 330 Korean and 220 English Web documents on audio and video shopping sites yielded an average extraction precision of 91.3% for Korean and 81.9%…

  14. WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK – AN OVERVIEW

    Directory of Open Access Journals (Sweden)

    V. Lakshmi Praba

    2011-03-01

    Full Text Available Web Mining is the extraction of interesting and potentially useful patterns and information from Web. It includes Web documents, hyperlinks between documents, and usage logs of web sites. The significant task for web mining can be listed out as Information Retrieval, Information Selection / Extraction, Generalization and Analysis. Web information retrieval tools consider only the text on pages and ignore information in the links. The goal of Web structure mining is to explore structural summary about web. Web structure mining focusing on link information is an important aspect of web data. This paper presents an overview of the PageRank, Improved Page Rank and its working functionality in web structure mining.

  15. Automated Extraction of Substance Use Information from Clinical Texts.

    Science.gov (United States)

    Wang, Yan; Chen, Elizabeth S; Pakhomov, Serguei; Arsoniadis, Elliot; Carter, Elizabeth W; Lindemann, Elizabeth; Sarkar, Indra Neil; Melton, Genevieve B

    2015-01-01

    Within clinical discourse, social history (SH) includes important information about substance use (alcohol, drug, and nicotine use) as key risk factors for disease, disability, and mortality. In this study, we developed and evaluated a natural language processing (NLP) system for automated detection of substance use statements and extraction of substance use attributes (e.g., temporal and status) based on Stanford Typed Dependencies. The developed NLP system leveraged linguistic resources and domain knowledge from a multi-site social history study, Propbank and the MiPACQ corpus. The system attained F-scores of 89.8, 84.6 and 89.4 respectively for alcohol, drug, and nicotine use statement detection, as well as average F-scores of 82.1, 90.3, 80.8, 88.7, 96.6, and 74.5 respectively for extraction of attributes. Our results suggest that NLP systems can achieve good performance when augmented with linguistic resources and domain knowledge when applied to a wide breadth of substance use free text clinical notes.

  16. Extraction of neutron spectral information from Bonner-Sphere data

    CERN Document Server

    Haney, J H; Zaidins, C S

    1999-01-01

    We have extended a least-squares method of extracting neutron spectral information from Bonner-Sphere data which was previously developed by Zaidins et al. (Med. Phys. 5 (1978) 42). A pulse-height analysis with background stripping is employed which provided a more accurate count rate for each sphere. Newer response curves by Mares and Schraube (Nucl. Instr. and Meth. A 366 (1994) 461) were included for the moderating spheres and the bare detector which comprise the Bonner spectrometer system. Finally, the neutron energy spectrum of interest was divided using the philosophy of fuzzy logic into three trapezoidal regimes corresponding to slow, moderate, and fast neutrons. Spectral data was taken using a PuBe source in two different environments and the analyzed data is presented for these cases as slow, moderate, and fast neutron fluences. (author)

  17. ONTOGRABBING: Extracting Information from Texts Using Generative Ontologies

    DEFF Research Database (Denmark)

    Nilsson, Jørgen Fischer; Szymczak, Bartlomiej Antoni; Jensen, P.A.

    2009-01-01

    We describe principles for extracting information from texts using a so-called generative ontology in combination with syntactic analysis. Generative ontologies are introduced as semantic domains for natural language phrases. Generative ontologies extend ordinary finite ontologies with rules...... analysis is primarily to identify paraphrases, thereby achieving a search functionality beyond mere keyword search with synsets. We further envisage use of the generative ontology as a phrase-based rather than word-based browser into text corpora....... for producing recursively shaped terms representing the ontological content (ontological semantics) of NL noun phrases and other phrases. We focus here on achieving a robust, often only partial, ontology-driven parsing of and ascription of semantics to a sentence in the text corpus. The aim of the ontological...

  18. Domain-independent information extraction in unstructured text

    Energy Technology Data Exchange (ETDEWEB)

    Irwin, N.H. [Sandia National Labs., Albuquerque, NM (United States). Software Surety Dept.

    1996-09-01

    Extracting information from unstructured text has become an important research area in recent years due to the large amount of text now electronically available. This status report describes the findings and work done during the second year of a two-year Laboratory Directed Research and Development Project. Building on the first-year`s work of identifying important entities, this report details techniques used to group words into semantic categories and to output templates containing selective document content. Using word profiles and category clustering derived during a training run, the time-consuming knowledge-building task can be avoided. Though the output still lacks in completeness when compared to systems with domain-specific knowledge bases, the results do look promising. The two approaches are compatible and could complement each other within the same system. Domain-independent approaches retain appeal as a system that adapts and learns will soon outpace a system with any amount of a priori knowledge.

  19. Querying and Extracting Timeline Information from Road Traffic Sensor Data.

    Science.gov (United States)

    Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen

    2016-08-23

    The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system-a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset.

  20. Querying and Extracting Timeline Information from Road Traffic Sensor Data

    Directory of Open Access Journals (Sweden)

    Ardi Imawan

    2016-08-01

    Full Text Available The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset.

  1. [Geographic information system based spatial analysis on chronic arsenic poisoning in a tin mining area, Thailand].

    Science.gov (United States)

    Zhang, Jianjun; Wu, Liping; Lin, Kun

    2007-05-01

    To explore the spatial features of arsenic contamination and its association with chronic arsenic poisoning in a tin mining area of Thailand. Geographic information system(GIS) was built up with integration of arsenic concentration in varied environmental media and occurring location data of chronic arsenism patients. Then, the spatial interpolation (IDW), buffer zoning, query and rank correlation analysis were applied. Groundwater and surface farming land were classified according to local environmental arsenic standards; the relative risk areas were identified. The incidence of chronic arsenic poisoning was significantly correlated with arsenic level in groundwater and soil type (P water soluble arsenic in soil (P > 0.05). The arsenic content in drinking water could be critical to chronic arsenic poisoning. The soil type could be an important factor affecting such poisoning. Trend analysis in GIS could provide a valuable tool for understanding the pollution situation and disease surveillance.

  2. Optimal Cell Towers Distribution by using Spatial Mining and Geographic Information System

    CERN Document Server

    AL-Hamami, Alaa H

    2011-01-01

    The appearance of wireless communication is dramatically changing our life. Mobile telecommunications emerged as a technological marvel allowing for access to personal and other services, devices, computation and communication, in any place and at any time through effortless plug and play. Setting up wireless mobile networks often requires: Frequency Assignment, Communication Protocol selection, Routing schemes selection, and cells towers distributions. This research aims to optimize the cells towers distribution by using spatial mining with Geographic Information System (GIS) as a tool. The distribution optimization could be done by applying the Digital Elevation Model (DEM) on the image of the area which must be covered with two levels of hierarchy. The research will apply the spatial association rules technique on the second level to select the best square in the cell for placing the antenna. From that the proposal will try to minimize the number of installed towers, makes tower's location feasible, and pr...

  3. Prospecting for dinosaurs on the mining frontier: The value of information in America's Gilded Age.

    Science.gov (United States)

    Rieppel, Lukas

    2015-04-01

    How much is a dinosaur worth? This essay offers an account of the way vertebrate fossils were priced in late 19th-century America to explore the process by which monetary values are established in science. Examining a long and drawn-out negotiation over the sale of an unusually rich dinosaur quarry in Wyoming, I argue that, on their own, abstract market principles did not suffice to mediate between supply and demand. Rather, people haggling over the price of dinosaur bones looked to social norms from the mineral industry for cues on how to value these rare and unusual objects, adopting a set of negotiation tactics that exploited asymmetries in the distribution of scarce information to secure the better end of the deal. On the mining frontier in America's Gilded Age, dinosaurs were thus valued in much the same way as any other scarce natural resource one could dig out of the ground, including gold, silver, and coal.

  4. Research of information classification and strategy intelligence extract algorithm based on military strategy hall

    Science.gov (United States)

    Chen, Lei; Li, Dehua; Yang, Jie

    2007-12-01

    Constructing virtual international strategy environment needs many kinds of information, such as economy, politic, military, diploma, culture, science, etc. So it is very important to build an information auto-extract, classification, recombination and analysis management system with high efficiency as the foundation and component of military strategy hall. This paper firstly use improved Boost algorithm to classify obtained initial information, then use a strategy intelligence extract algorithm to extract strategy intelligence from initial information to help strategist to analysis information.

  5. Mining for diagnostic information in body surface potential maps: A comparison of feature selection techniques

    Directory of Open Access Journals (Sweden)

    McCullagh Paul J

    2005-09-01

    Full Text Available Abstract Background In body surface potential mapping, increased spatial sampling is used to allow more accurate detection of a cardiac abnormality. Although diagnostically superior to more conventional electrocardiographic techniques, the perceived complexity of the Body Surface Potential Map (BSPM acquisition process has prohibited its acceptance in clinical practice. For this reason there is an interest in striking a compromise between the minimum number of electrocardiographic recording sites required to sample the maximum electrocardiographic information. Methods In the current study, several techniques widely used in the domains of data mining and knowledge discovery have been employed to mine for diagnostic information in 192 lead BSPMs. In particular, the Single Variable Classifier (SVC based filter and Sequential Forward Selection (SFS based wrapper approaches to feature selection have been implemented and evaluated. Using a set of recordings from 116 subjects, the diagnostic ability of subsets of 3, 6, 9, 12, 24 and 32 electrocardiographic recording sites have been evaluated based on their ability to correctly asses the presence or absence of Myocardial Infarction (MI. Results It was observed that the wrapper approach, using sequential forward selection and a 5 nearest neighbour classifier, was capable of choosing a set of 24 recording sites that could correctly classify 82.8% of BSPMs. Although the filter method performed slightly less favourably, the performance was comparable with a classification accuracy of 79.3%. In addition, experiments were conducted to show how (a features chosen using the wrapper approach were specific to the classifier used in the selection model, and (b lead subsets chosen were not necessarily unique. Conclusion It was concluded that both the filter and wrapper approaches adopted were suitable for guiding the choice of recording sites useful for determining the presence of MI. It should be noted however

  6. Numerical linear algebra in data mining

    Science.gov (United States)

    Eldén, Lars

    Ideas and algorithms from numerical linear algebra are important in several areas of data mining. We give an overview of linear algebra methods in text mining (information retrieval), pattern recognition (classification of handwritten digits), and PageRank computations for web search engines. The emphasis is on rank reduction as a method of extracting information from a data matrix, low-rank approximation of matrices using the singular value decomposition and clustering, and on eigenvalue methods for network analysis.

  7. A Volterra series-based method for extracting target echoes in the seafloor mining environment.

    Science.gov (United States)

    Zhao, Haiming; Ji, Yaqian; Hong, Yujiu; Hao, Qi; Ma, Liyong

    2016-09-01

    The purpose of this research was to evaluate the applicability of the Volterra adaptive method to predict the target echo of an ultrasonic signal in an underwater seafloor mining environment. There is growing interest in mining of seafloor minerals because they offer an alternative source of rare metals. Mining the minerals cause the seafloor sediments to be stirred up and suspended in sea water. In such an environment, the target signals used for seafloor mapping are unable to be detected because of the unavoidable presence of volume reverberation induced by the suspended sediments. The detection of target signals in reverberation is currently performed using a stochastic model (for example, the autoregressive (AR) model) based on the statistical characterisation of reverberation. However, we examined a new method of signal detection in volume reverberation based on the Volterra series by confirming that the reverberation is a chaotic signal and generated by a deterministic process. The advantage of this method over the stochastic model is that attributions of the specific physical process are considered in the signal detection problem. To test the Volterra series based method and its applicability to target signal detection in the volume reverberation environment derived from the seafloor mining process, we simulated the real-life conditions of seafloor mining in a water filled tank of dimensions of 5×3×1.8m. The bottom of the tank was covered with 10cm of an irregular sand layer under which 5cm of an irregular cobalt-rich crusts layer was placed. The bottom was interrogated by an acoustic wave generated as 16μs pulses of 500kHz frequency. This frequency is demonstrated to ensure a resolution on the order of one centimetre, which is adequate in exploration practice. Echo signals were collected with a data acquisition card (PCI 1714 UL, 12-bit). Detection of the target echo in these signals was performed by both the Volterra series based model and the AR model

  8. Text Mining in Biomedical Domain with Emphasis on Document Clustering.

    Science.gov (United States)

    Renganathan, Vinaitheerthan

    2017-07-01

    With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.

  9. Toxicity of sediments potentially contaminated by coal mining and natural gas extraction to unionid mussels and commonly tested benthic invertebrates.

    Science.gov (United States)

    Wang, Ning; Ingersoll, Christopher G; Kunz, James L; Brumbaugh, William G; Kane, Cindy M; Evans, R Brian; Alexander, Steven; Walker, Craig; Bakaletz, Steve

    2013-01-01

    Sediment toxicity tests were conducted to assess potential effects of contaminants associated with coal mining or natural gas extraction activities in the upper Tennessee River basin and eastern Cumberland River basin in the United States. Test species included two unionid mussels (rainbow mussel, Villosa iris, and wavy-rayed lampmussel, Lampsilis fasciola, 28-d exposures), and the commonly tested amphipod, Hyalella azteca (28-d exposure) and midge, Chironomus dilutus (10-d exposure). Sediments were collected from seven test sites with mussel communities classified as impacted and in proximity to coal mining or gas extraction activities, and from five reference sites with mussel communities classified as not impacted and no or limited coal mining or gas extraction activities. Additional samples were collected from six test sites potentially with high concentrations of polycyclic aromatic hydrocarbons (PAHs) and from a test site contaminated by a coal ash spill. Mean survival, length, or biomass of one or more test species was reduced in 10 of 14 test samples (71%) from impacted areas relative to the response of organisms in the five reference samples. A higher proportion of samples was classified as toxic to mussels (63% for rainbow mussels, 50% for wavy-rayed lampmussels) compared with amphipods (38%) or midge (38%). Concentrations of total recoverable metals and total PAHs in sediments did not exceed effects-based probable effect concentrations (PECs). However, the survival, length, or biomasses of the mussels were reduced significantly with increasing PEC quotients for metals and for total PAHs, or with increasing sum equilibrium-partitioning sediment benchmark toxic units for PAHs. The growth of the rainbow mussel also significantly decreased with increasing concentrations of a major anion (chloride) and major cations (calcium and magnesium) in sediment pore water. Results of the present study indicated that (1) the findings from laboratory tests were generally

  10. Toxicity of sediments potentially contaminated by coal mining and natural gas extraction to unionid mussels and commonly tested benthic invertebrates

    Science.gov (United States)

    Wang, Ning; Ingersoll, Christopher G.; Kunz, James L.; Brumbaugh, William G.; Kane, Cindy M.; Evans, R. Brian; Alexander, Steven; Walker, Craig; Bakaletz, Steve

    2013-01-01

    Sediment toxicity tests were conducted to assess potential effects of contaminants associated with coal mining or natural gas extraction activities in the upper Tennessee River basin and eastern Cumberland River basin in the United States. Test species included two unionid mussels (rainbow mussel, Villosa iris, and wavy-rayed lampmussel, Lampsilis fasciola, 28-d exposures), and the commonly tested amphipod, Hyalella azteca (28-d exposure) and midge, Chironomus dilutus (10-d exposure). Sediments were collected from seven test sites with mussel communities classified as impacted and in proximity to coal mining or gas extraction activities, and from five reference sites with mussel communities classified as not impacted and no or limited coal mining or gas extraction activities. Additional samples were collected from six test sites potentially with high concentrations of polycyclic aromatic hydrocarbons (PAHs) and from a test site contaminated by a coal ash spill. Mean survival, length, or biomass of one or more test species was reduced in 10 of 14 test samples (71%) from impacted areas relative to the response of organisms in the five reference samples. A higher proportion of samples was classified as toxic to mussels (63% for rainbow mussels, 50% for wavy-rayed lampmussels) compared with amphipods (38%) or midge (38%). Concentrations of total recoverable metals and total PAHs in sediments did not exceed effects-based probable effect concentrations (PECs). However, the survival, length, or biomasses of the mussels were reduced significantly with increasing PEC quotients for metals and for total PAHs, or with increasing sum equilibrium-partitioning sediment benchmark toxic units for PAHs. The growth of the rainbow mussel also significantly decreased with increasing concentrations of a major anion (chloride) and major cations (calcium and magnesium) in sediment pore water. Results of the present study indicated that (1) the findings from laboratory tests were generally

  11. [Introduction to medical data mining].

    Science.gov (United States)

    Zhu, Lingyun; Wu, Baoming; Cao, Changxiu

    2003-09-01

    Modern medicine generates a great deal of information stored in the medical database. Extracting useful knowledge and providing scientific decision-making for the diagnosis and treatment of disease from the database increasingly becomes necessary. Data mining in medicine can deal with this problem. It can also improve the management level of hospital information and promote the development of telemedicine and community medicine. Because the medical information is characteristic of redundancy, multi-attribution, incompletion and closely related with time, medical data mining differs from other one. In this paper we have discussed the key techniques of medical data mining involving pretreatment of medical data, fusion of different pattern and resource, fast and robust mining algorithms and reliability of mining results. The methods and applications of medical data mining based on computation intelligence such as artificial neural network, fuzzy system, evolutionary algorithms, rough set, and support vector machine have been introduced. The features and problems in data mining are summarized in the last section.

  12. Techniques, Applications and Challenging Issue in Text Mining

    Directory of Open Access Journals (Sweden)

    Shaidah Jusoh

    2012-11-01

    Full Text Available Text mining is a very exciting research area as it tries to discover knowledge from unstructured texts. These texts can be found on a desktop, intranets and the internet. The aim of this paper is to give an overview of text mining in the contexts of its techniques, application domains and the most challenging issue. The focus is given on fundamentals methods of text mining which include natural language possessing and information extraction. This paper also gives a short review on domains which have employed text mining. The challenging issue in text mining which is caused by the complexity in a natural language is also addressed in this paper.

  13. Comparative Data Mining Analysis for Information Retrieval of MODIS Images: Monitoring Lake Turbidity Changes at Lake Okeechobee, Florida

    Science.gov (United States)

    In the remote sensing field, a frequently recurring question is: Which computational intelligence or data mining algorithms are most suitable for the retrieval of essential information given that most natural systems exhibit very high non-linearity. Among potential candidates mig...

  14. Ancient silver extraction in the Montevecchio mine basin (Sardinia, Italy): micro-chemical study of pyrometallurgical materials

    Science.gov (United States)

    De Caro, Tilde; Riccucci, Cristina; Parisi, Erica I.; Faraldi, Federica; Caschera, D.

    2013-12-01

    Different pyrometallurgical materials such as slags, refractory materials and thermally treated lead ores likely related to smelting and extractive processes and chronologically related to Punic and Roman periods (IV-III BC) have been found at Bocche di Sciria and Conca e Mosu in the Montevecchio mine basin (south western Sardinia, Italy), where archaeological findings and classical authors locate extractive metallurgy activities since pre-Roman times. By means of the combined use of X-ray diffraction (XRD), scanning electron microscopy (SEM) combined with energy-dispersive X-ray spectrometry (EDS), selected-area X-ray photoelectron spectroscopy (XPS) and optical microscopy (OM), micro-chemical and micro-structural investigations have been carried out in order to identify the nature of the pyrometallurgical materials, to decipher the processes carried out there and their technological steps and to determine the technological level of competence reached by the ancient metallurgists. The results confirm that the findings can be associated with smelting and extractive processes carried out close to the metal ore deposits first for the argentiferous lead production and, then, for the silver recovery via a cupellation process. Finally, the results disclose the high level of technological competence of the ancient metallurgists able to carry out complex high-temperature processes to treat the argentiferous lead ores and to recover low amounts of silver via high-temperature lead-selective oxidation.

  15. Analysis on Recommended System for Web Information Retrieval Using HMM

    Directory of Open Access Journals (Sweden)

    Himangni Rathore

    2014-11-01

    Full Text Available Web is a rich domain of data and knowledge, which is spread over the world in unstructured manner. The number of users is continuously access the information over the internet. Web mining is an application of data mining where web related data is extracted and manipulated for extracting knowledge. The data mining is used in the domain of web information mining is refers as web mining, that is further divided into three major domains web uses mining, web content mining and web structure mining. The proposed work is intended to work with web uses mining. The concept of web mining is to improve the user feedbacks and user navigation pattern discovery for a CRM system. Finally a new algorithm HMM is used for finding the pattern in data, which method promises to provide much accurate recommendation.

  16. Text mining: A Brief survey

    Directory of Open Access Journals (Sweden)

    Falguni N. Patel , Neha R. Soni

    2012-12-01

    Full Text Available The unstructured texts which contain massive amount of information cannot simply be used for further processing by computers. Therefore, specific processing methods and algorithms are required in order to extract useful patterns. The process of extracting interesting information and knowledge from unstructured text completed by using Text mining. In this paper, we have discussed text mining, as a recent and interesting field with the detail of steps involved in the overall process. We have also discussed different technologies that teach computers with natural language so that they may analyze, understand, and even generate text. In addition, we briefly discuss a number of successful applications of text mining which are used currently and in future.

  17. Extracting information in spike time patterns with wavelets and information theory.

    Science.gov (United States)

    Lopes-dos-Santos, Vítor; Panzeri, Stefano; Kayser, Christoph; Diamond, Mathew E; Quian Quiroga, Rodrigo

    2015-02-01

    We present a new method to assess the information carried by temporal patterns in spike trains. The method first performs a wavelet decomposition of the spike trains, then uses Shannon information to select a subset of coefficients carrying information, and finally assesses timing information in terms of decoding performance: the ability to identify the presented stimuli from spike train patterns. We show that the method allows: 1) a robust assessment of the information carried by spike time patterns even when this is distributed across multiple time scales and time points; 2) an effective denoising of the raster plots that improves the estimate of stimulus tuning of spike trains; and 3) an assessment of the information carried by temporally coordinated spikes across neurons. Using simulated data, we demonstrate that the Wavelet-Information (WI) method performs better and is more robust to spike time-jitter, background noise, and sample size than well-established approaches, such as principal component analysis, direct estimates of information from digitized spike trains, or a metric-based method. Furthermore, when applied to real spike trains from monkey auditory cortex and from rat barrel cortex, the WI method allows extracting larger amounts of spike timing information. Importantly, the fact that the WI method incorporates multiple time scales makes it robust to the choice of partly arbitrary parameters such as temporal resolution, response window length, number of response features considered, and the number of available trials. These results highlight the potential of the proposed method for accurate and objective assessments of how spike timing encodes information. Copyright © 2015 the American Physiological Society.

  18. Selenium speciation in phosphate mine soils and evaluation of a sequential extraction procedure using XAFS.

    Science.gov (United States)

    Favorito, Jessica E; Luxton, Todd P; Eick, Matthew J; Grossl, Paul R

    2017-10-01

    Selenium is a trace element found in western US soils, where ingestion of Se-accumulating plants has resulted in livestock fatalities. Therefore, a reliable understanding of Se speciation and bioavailability is critical for effective mitigation. Sequential extraction procedures (SEP) are often employed to examine Se phases and speciation in contaminated soils but may be limited by experimental conditions. We examined the validity of a SEP using X-ray absorption spectroscopy (XAS) for both whole and a sequence of extracted soils. The sequence included removal of soluble, PO4-extractable, carbonate, amorphous Fe-oxide, crystalline Fe-oxide, organic, and residual Se forms. For whole soils, XANES analyses indicated Se(0) and Se(-II) predominated, with lower amounts of Se(IV) present, related to carbonates and Fe-oxides. Oxidized Se species were more elevated and residual/elemental Se was lower than previous SEP results from ICP-AES suggested. For soils from the SEP sequence, XANES results indicated only partial recovery of carbonate, Fe-oxide and organic Se. This suggests Se was incompletely removed during designated extractions, possibly due to lack of mineral solubilization or reagent specificity. Selenium fractions associated with Fe-oxides were reduced in amount or removed after using hydroxylamine HCl for most soils examined. XANES results indicate partial dissolution of solid-phases may occur during extraction processes. This study demonstrates why precautions should be taken to improve the validity of SEPs. Mineralogical and chemical characterizations should be completed prior to SEP implementation to identify extractable phases or mineral components that may influence extraction effectiveness. Sequential extraction procedures can be appropriately tailored for reliable quantification of speciation in contaminated soils. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Earth Science Data Analytics: Preparing for Extracting Knowledge from Information

    Science.gov (United States)

    Kempler, Steven; Barbieri, Lindsay

    2016-01-01

    Data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information. Data analytics is a broad term that includes data analysis, as well as an understanding of the cognitive processes an analyst uses to understand problems and explore data in meaningful ways. Analytics also include data extraction, transformation, and reduction, utilizing specific tools, techniques, and methods. Turning to data science, definitions of data science sound very similar to those of data analytics (which leads to a lot of the confusion between the two). But the skills needed for both, co-analyzing large amounts of heterogeneous data, understanding and utilizing relevant tools and techniques, and subject matter expertise, although similar, serve different purposes. Data Analytics takes on a practitioners approach to applying expertise and skills to solve issues and gain subject knowledge. Data Science, is more theoretical (research in itself) in nature, providing strategic actionable insights and new innovative methodologies. Earth Science Data Analytics (ESDA) is the process of examining, preparing, reducing, and analyzing large amounts of spatial (multi-dimensional), temporal, or spectral data using a variety of data types to uncover patterns, correlations and other information, to better understand our Earth. The large variety of datasets (temporal spatial differences, data types, formats, etc.) invite the need for data analytics skills that understand the science domain, and data preparation, reduction, and analysis techniques, from a practitioners point of view. The application of these skills to ESDA is the focus of this presentation. The Earth Science Information Partners (ESIP) Federation Earth Science Data Analytics (ESDA) Cluster was created in recognition of the practical need to facilitate the co-analysis of large amounts of data and information for Earth science. Thus, from a to

  20. USGS compilation of geographic information system (GIS) data representing coal mines and coal-bearing areas in China

    Science.gov (United States)

    Trippi, Michael H.; Belkin, Harvey E.; Dai, Shifeng; Tewalt, Susan J.; Chou, Chiu-Jung; Trippi, Michael H.; Belkin, Harvey E.; Dai, Shifeng; Tewalt, Susan J.; Chou, Chiu-Jung

    2015-01-01

    Geographic information system (GIS) information may facilitate energy studies, which in turn provide input for energy policy decisions. The U.S. Geological Survey (USGS) has compiled geographic information system (GIS) data representing the known coal mine locations and coal-mining areas of China as of 2001. These data are now available for download, and may be used in a GIS for a variety of energy resource and environmental studies of China. Province-scale maps were also created to display the point locations of coal mines and the coal-mining areas. In addition, coal-field outlines from a previously published map by Dai and others (2012) were also digitized and are available for download as a separate GIS data file, and shown in a nation-scale map of China. Chemical data for 332 coal samples from a previous USGS study of China and Taiwan (Tewalt and others, 2010) are included in a downloadable GIS point shapefile, and shown on a nation-scale map of China. A brief report summarizes the methodology used for creation of the shapefiles and the chemical analyses run on the samples.

  1. Information system for preserving culture heritage in areas affected by heavy industry and mining

    Science.gov (United States)

    Pacina, Jan; Kopecký, Jiří; Bedrníková, Lenka; Handrychová, Barbora; Švarcová, Martina; Holá, Markéta; Pončíková, Edita

    2014-05-01

    The natural development of the Ústí region (North-West Bohemia, the Czech Republic) has been affected by the human activity during the past hundred years. The heavy industrialization and the brown coal mining have completely changed the land-use in the region. The open-pit coal mines are completely destroying the surrounding landscape, including settlement, communications, hydrological network and the over-all natural development of the region. The other factor affecting the natural development of the landscape, land-use and settlement was the political situation in 1945 (end of the 2nd World War) when the borderland was depopulated. All these factors caused vanishing of more than two hundreds of colonies, villages and towns during this period of time. The task of this project is to prepare and offer for public use a comprehensive information system preserving the cultural heritage in the form of processed old maps, aerial imagery, land-use and georelief reconstructions, local studies, text and photo documents covering the extinct landscape and settlement. Wide range of various maps was used for this area - Müller's map of Bohemia (ca. 1720) followed by the 1st, 2nd and 3rd Military survey of Habsburg empire (1792, 1894, 1938), maps of Stabile cadaster (ca. 1840) and State map derived in the scale 1:5000 (1953, 1972, 1981). All the maps were processed, georeferenced, hand digitized and are further used as base layers for visualization and analysis. The historical aerial imagery was processed in standard ways of photogrammetry and is covering the year 1938, 1953 and the current state. The other important task covered by this project is the georelief reconstruction. We use the old maps and aerial imagery to reconstruct the complete time-line of the georelief development. This time-line is covering the period since 1938 until now. The derived digital terrain models and further on analyzed and printed on a 3D printer. Other reconstruction task are performed using

  2. WEKA-G: Parallel data mining on computational grids

    Directory of Open Access Journals (Sweden)

    PIMENTA, A.

    2009-12-01

    Full Text Available Data mining is a technology that can extract useful information from large amounts of data. However, mining a database often requires a high computational power. To resolve this problem, this paper presents a tool (Weka-G, which runs in parallel algorithms used in the mining process data. As the environment for doing so, we use a computational grid by adding several features within a WAN.

  3. How to Apply Data Mining Technology to the Study of Agricultural Information Data Resources?

    Institute of Scientific and Technical Information of China (English)

    Xindong; WANG; Haoyue; XU; Qian; GAO; Haiyan; CAI; Junhai; LU; Min; LI

    2013-01-01

    This paper makes a brief description of the definition and methods of data mining.It describes the characteristics of agricultural data(value delivery,specialization,spatio-temporal bidimensionality)and the status of application of data mining technology in agriculture.

  4. 77 FR 62266 - Proposed Extension of Existing Information Collection; Daily Inspection of Surface Coal Mines...

    Science.gov (United States)

    2012-10-12

    ... understood, and the impact of collection requirements on respondents can be properly assessed. Currently, the... facilities. Highwalls, mining equipment, travelways, and the handling of mining materials each present... better ensure a safe working environment for the miners and a reduction in accidents. II. Desired...

  5. Mining dark information resources to develop new informatics capabilities to support science

    Science.gov (United States)

    Ramachandran, Rahul; Maskey, Manil; Bugbee, Kaylin

    2016-04-01

    Dark information resources are digital resources that organizations collect, process, and store for regular business or operational activities but fail to realize their potential for other purposes. The challenge for any organization is to recognize, identify and effectively exploit these dark information stores. Metadata catalogs at different data centers store dark information resources consisting of structured information, free form descriptions of data and browse images. These information resources are never fully exploited beyond a few fields used for search and discovery. For example, the NASA Earth science catalog holds greater than 6000 data collections, 127 million records for individual files and 67 million browse images. We believe that the information contained in the metadata catalogs and the browse images can be utilized beyond their original design intent to provide new data discovery and exploration pathways to support science and education communities. In this paper we present two research applications using information stored in the metadata catalog in a completely novel way. The first application is designing a data curation service. The objective of the data curation service is to augment the existing data search capabilities. Given a specific atmospheric phenomenon, the data curation service returns the user a ranked list of relevant data sets. Different fields in the metadata records including textual descriptions are mined. A specialized relevancy ranking algorithm has been developed that uses a "bag of words" to define phenomena along with an ensemble of known approaches such as the Jaccard Coefficient, Cosine Similarity and Zone ranking to rank the data sets. This approach is also extended to map from the data set level to data file variable level. The second application is focused on providing a service where a user can search and discover browse images containing specific phenomena from the vast catalog. This service will aid researchers

  6. Point Cloud Classification of Tesserae from Terrestrial Laser Data Combined with Dense Image Matching for Archaeological Information Extraction

    Science.gov (United States)

    Poux, F.; Neuville, R.; Billen, R.

    2017-08-01

    Reasoning from information extraction given by point cloud data mining allows contextual adaptation and fast decision making. However, to achieve this perceptive level, a point cloud must be semantically rich, retaining relevant information for the end user. This paper presents an automatic knowledge-based method for pre-processing multi-sensory data and classifying a hybrid point cloud from both terrestrial laser scanning and dense image matching. Using 18 features including sensor's biased data, each tessera in the high-density point cloud from the 3D captured complex mosaics of Germigny-des-prés (France) is segmented via a colour multi-scale abstraction-based featuring extracting connectivity. A 2D surface and outline polygon of each tessera is generated by a RANSAC plane extraction and convex hull fitting. Knowledge is then used to classify every tesserae based on their size, surface, shape, material properties and their neighbour's class. The detection and semantic enrichment method shows promising results of 94% correct semantization, a first step toward the creation of an archaeological smart point cloud.

  7. Integrating multiple immunogenetic data sources for feature extraction and mining somatic hypermutation patterns: the case of "towards analysis" in chronic lymphocytic leukaemia.

    Science.gov (United States)

    Kavakiotis, Ioannis; Xochelli, Aliki; Agathangelidis, Andreas; Tsoumakas, Grigorios; Maglaveras, Nicos; Stamatopoulos, Kostas; Hadzidimitriou, Anastasia; Vlahavas, Ioannis; Chouvarda, Ioanna

    2016-06-06

    Somatic Hypermutation (SHM) refers to the introduction of mutations within rearranged V(D)J genes, a process that increases the diversity of Immunoglobulins (IGs). The analysis of SHM has offered critical insight into the physiology and pathology of B cells, leading to strong prognostication markers for clinical outcome in chronic lymphocytic leukaemia (CLL), the most frequent adult B-cell malignancy. In this paper we present a methodology for integrating multiple immunogenetic and clinocobiological data sources in order to extract features and create high quality datasets for SHM analysis in IG receptors of CLL patients. This dataset is used as the basis for a higher level integration procedure, inspired form social choice theory. This is applied in the Towards Analysis, our attempt to investigate the potential ontogenetic transformation of genes belonging to specific stereotyped CLL subsets towards other genes or gene families, through SHM. The data integration process, followed by feature extraction, resulted in the generation of a dataset containing information about mutations occurring through SHM. The Towards analysis performed on the integrated dataset applying voting techniques, revealed the distinct behaviour of subset #201 compared to other subsets, as regards SHM related movements among gene clans, both in allele-conserved and non-conserved gene areas. With respect to movement between genes, a high percentage movement towards pseudo genes was found in all CLL subsets. This data integration and feature extraction process can set the basis for exploratory analysis or a fully automated computational data mining approach on many as yet unanswered, clinically relevant biological questions.

  8. Multisource classification and pattern recognition methods for polar geospatial information mining using WorldView-2 data

    Science.gov (United States)

    Khopkar, Parag S.; Jawak, Shridhar D.; Luis, Alvarinho J.

    2016-04-01

    Current research study emphasizes the importance of advanced digital image processing methods in order to delineate between various LULC features. In the case of the Antarctica, the present LC (snow/ice, landmass, water, vegetation etc.) and the present LU (research stations of various nations) needs to be mapped accurately for the hassle free routine activities. Geo-location has become the most important part of geosciences studies. In this paper we have tried to locate three most important features (snow/ice, landmass, and water) and also have extracted the extent of the same using the multisource classification (image fusion/pansharpening) and pattern recognition (supervised/unsupervised methods, index ratio methods). Innovation in developing spectral index ratios has led us to come up with an unique ratio named Normalized Difference Landmass Index (NDLI) which performed better (Avg. Bias: 51.99m) than other ratios such as Normalized Difference Snow/Ice Index (NDSII) (Avg. Bias: -1572.11m) and Normalized Difference Water Index (NDWI) (Avg. Bias: 1886.60m). The practiced trial and error methodology quantifies the productivity of not only the classification methods over one other but also that of the fusion methods. In present study, classifiers used (Mahalanobis and Winner Takes All) performed better (Avg. Bias: 122.16 m) than spectral index ratios (Avg. Bias: 620.16 m). The study also revealed that newly introduced bands in WorldView-2, band 1 (Coastal Blue), 4 (Yellow), 6 (Red-edge) and 8 (Near Infrared-2) along with traditional bands have the capacity to mine the polar geospatial information with utmost accuracy and efficiency.

  9. CHANGE OF PARADIGM IN UNDERGROUND HARD COAL MINING THROUGH EXTRACTION AND CAPITALIZATION OF METHANE FOR ENERGY PRODUCTION

    Directory of Open Access Journals (Sweden)

    Valeriu PLESEA

    2014-05-01

    Full Text Available Besides oil and gas, coal is the most important fossil fuel for energy production. Of the energy mixture of our country, the internal production gas share is 80% of the required annual consumption, of about 14 billion cubic meters, the rest of 20% being insured by importing, by the Russian company Gazprom. The share of coal in the National Power System (NPS is of 24% and is one of the most profitable energy production sources, taking into account the continuous increase of gas price and its dependence on external suppliers. Taking into account the infestation of the atmosphere and global warming as effect of important release of greenhouse gas and carbon dioxide as a result of coal burning for energy production in thermal power plants, there is required to identify new solutions for keeping the environment clean. Such a solution is presented in the study and analysis shown in the paper and is the extraction and capitalization of methane from the coal deposits and the underground spaces remaining free after mine closures. Underground methane extraction is considered even more opportune because, during coal exploitation, large quantities of such combustible gas are released and exhausted into the atmosphere by the degasification and ventilation stations from the surface, representing and important pollution factor for the environment, as greenhouse gas with high global warming potential (high GWP of about 21 times higher than carbon dioxide.

  10. Potential for improved extraction of tellurium as a byproduct of current copper mining processes

    Science.gov (United States)

    Hayes, S. M.; Spaleta, K. J.; Skidmore, A. E.

    2016-12-01

    Tellurium (Te) is classified as a critical element due to its increasing use in high technology applications, low average crustal abundance (3 μg kg-1), and primary source as a byproduct of copper extraction. Although Te can be readily recovered from copper processing, previous studies have estimated a 4 percent extraction efficiency, and few studies have addressed Te behavior during the entire copper extraction process. The goals of the present study are to perform a mass balance examining Te behavior during copper extraction and to connect these observations with mineralogy of Te-bearing phases which are essential first steps in devising ways to optimize Te recovery. Our preliminary mass balance results indicate that less than 3 percent of Te present in copper ore is recovered, with particularly high losses during initial concentration of copper ore minerals by flotation. Tellurium is present in the ore in telluride minerals (e.g., Bi-Te-S phases, altaite, and Ag-S-Se-Te phases identified using electron microprobe) with limited substitution into sulfide minerals (possibly 10 mg kg-1 Te in bulk pyrite and chalcopyrite). This work has also identified Te accumulation in solid-phase intermediate extraction products that could be further processed to recover Te, including smelter dusts (158 mg kg-1) and pressed anode slimes (2.7 percent by mass). In both the smelter dusts and anode slimes, X-ray absorption spectroscopy indicates that about two thirds of the Te is present as reduced tellurides. In anode slimes, electron microscopy shows that the remaining Te is present in an oxidized form in a complex Te-bearing oxidate phase also containing Pb, Cu, Ag, As, Sb, and S. These results clearly indicate that more efficient, increased recovery of Te may be possible, likely at minimal expense from operating copper processing operations, thereby providing more Te for manufacturing of products such as inexpensive high-efficiency solar panels.

  11. CONAN : Text Mining in the Biomedical Domain

    NARCIS (Netherlands)

    Malik, R.

    2006-01-01

    This thesis is about Text Mining. Extracting important information from literature. In the last years, the number of biomedical articles and journals is growing exponentially. Scientists might not find the information they want because of the large number of publications. Therefore a system was cons

  12. CONAN : Text Mining in the Biomedical Domain

    NARCIS (Netherlands)

    Malik, R.

    2006-01-01

    This thesis is about Text Mining. Extracting important information from literature. In the last years, the number of biomedical articles and journals is growing exponentially. Scientists might not find the information they want because of the large number of publications. Therefore a system was

  13. String Mining in Bioinformatics

    Science.gov (United States)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  14. Healthcare information systems: data mining methods in the creation of a clinical recommender system

    Science.gov (United States)

    Duan, L.; Street, W. N.; Xu, E.

    2011-05-01

    Recommender systems have been extensively studied to present items, such as movies, music and books that are likely of interest to the user. Researchers have indicated that integrated medical information systems are becoming an essential part of the modern healthcare systems. Such systems have evolved to an integrated enterprise-wide system. In particular, such systems are considered as a type of enterprise information systems or ERP system addressing healthcare industry sector needs. As part of efforts, nursing care plan recommender systems can provide clinical decision support, nursing education, clinical quality control, and serve as a complement to existing practice guidelines. We propose to use correlations among nursing diagnoses, outcomes and interventions to create a recommender system for constructing nursing care plans. In the current study, we used nursing diagnosis data to develop the methodology. Our system utilises a prefix-tree structure common in itemset mining to construct a ranked list of suggested care plan items based on previously-entered items. Unlike common commercial systems, our system makes sequential recommendations based on user interaction, modifying a ranked list of suggested items at each step in care plan construction. We rank items based on traditional association-rule measures such as support and confidence, as well as a novel measure that anticipates which selections might improve the quality of future rankings. Since the multi-step nature of our recommendations presents problems for traditional evaluation measures, we also present a new evaluation method based on average ranking position and use it to test the effectiveness of different recommendation strategies.

  15. Data Mining Concepts with Customer Relationship Management

    Directory of Open Access Journals (Sweden)

    Mubeena Shaik,

    2014-07-01

    Full Text Available Data mining is important in creating a great experience at e-business. Data mining is the systematic way of extracting information from data. Many of the companies are developing an online internet presence to sell or promote their products and services. Most of the internet users are aware of on-line shopping concepts and techniques to own a product. The e-commerce landscape is the relation between customer relationship management (sales, marketing & support, internet and suppliers.

  16. Data mining and business analytics with R

    CERN Document Server

    Ledolter, Johannes

    2013-01-01

    Collecting, analyzing, and extracting valuable information from a large amount of data requires easily accessible, robust, computational and analytical tools. Data Mining and Business Analytics with R utilizes the open source software R for the analysis, exploration, and simplification of large high-dimensional data sets. As a result, readers are provided with the needed guidance to model and interpret complicated data and become adept at building powerful models for prediction and classification. Highlighting both underlying concepts and practical computational skills, Data Mining

  17. Medicaid Analytic eXtract (MAX) General Information

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Medicaid Analytic eXtract (MAX) data is a set of person-level data files on Medicaid eligibility, service utilization, and payments. The MAX data are created to...

  18. Medicaid Analytic eXtract (MAX) General Information

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Medicaid Analytic eXtract (MAX) data is a set of person-level data files on Medicaid eligibility, service utilization, and payments. The MAX data are created to...

  19. A Noise Addition Scheme in Decision Tree for Privacy Preserving Data Mining

    CERN Document Server

    Kadampur, Mohammad Ali

    2010-01-01

    Data mining deals with automatic extraction of previously unknown patterns from large amounts of data. Organizations all over the world handle large amounts of data and are dependent on mining gigantic data sets for expansion of their enterprises. These data sets typically contain sensitive individual information, which consequently get exposed to the other parties. Though we cannot deny the benefits of knowledge discovery that comes through data mining, we should also ensure that data privacy is maintained in the event of data mining. Privacy preserving data mining is a specialized activity in which the data privacy is ensured during data mining. Data privacy is as important as the extracted knowledge and efforts that guarantee data privacy during data mining are encouraged. In this paper we propose a strategy that protects the data privacy during decision tree analysis of data mining process. We propose to add specific noise to the numeric attributes after exploring the decision tree of the original data. T...

  20. Text mining for systems biology.

    Science.gov (United States)

    Fluck, Juliane; Hofmann-Apitius, Martin

    2014-02-01

    Scientific communication in biomedicine is, by and large, still text based. Text mining technologies for the automated extraction of useful biomedical information from unstructured text that can be directly used for systems biology modelling have been substantially improved over the past few years. In this review, we underline the importance of named entity recognition and relationship extraction as fundamental approaches that are relevant to systems biology. Furthermore, we emphasize the role of publicly organized scientific benchmarking challenges that reflect the current status of text-mining technology and are important in moving the entire field forward. Given further interdisciplinary development of systems biology-orientated ontologies and training corpora, we expect a steadily increasing impact of text-mining technology on systems biology in the future.

  1. Mobile helium-3 mining and extraction system and its benefits toward lunar base self-sufficiency

    Science.gov (United States)

    Sviatoslavsky, I. N.; Jacobs, M.

    The paper examines the issues of extracting He-3 from lunar regolith using mobile miners and its implications for the fusion-energy resupply of a lunar base. These issues include excavating, conveying, beneficiating, and heating the regloith, as well as collecting, transporting, and condensing the released solar-wind products. The benefits of such an operation toward lunar base self-sufficiency are described along with terrestrial benefits.

  2. Feature extraction and analysis of online reviews for the recommendation of books using opinion mining technique

    OpenAIRE

    Shahab Saquib Sohail; Jamshed Siddiqui; Rashid Ali

    2016-01-01

    The customer's review plays an important role in deciding the purchasing behaviour for online shopping as a customer prefers to get the opinion of other customers by observing their opinion through online products’ reviews, blogs and social networking sites, etc. The customer's reviews reflect the customer's sentiments and have a substantial significance for the products being sold online including electronic gadgets, movies, house hold appliances and books. Hence, extracting the exact featur...

  3. Feature extraction and analysis of online reviews for the recommendation of books using opinion mining technique

    Directory of Open Access Journals (Sweden)

    Shahab Saquib Sohail

    2016-09-01

    Full Text Available The customer's review plays an important role in deciding the purchasing behaviour for online shopping as a customer prefers to get the opinion of other customers by observing their opinion through online products’ reviews, blogs and social networking sites, etc. The customer's reviews reflect the customer's sentiments and have a substantial significance for the products being sold online including electronic gadgets, movies, house hold appliances and books. Hence, extracting the exact features of the products by analyzing the text of reviews requires a lot of efforts and human intelligence. In this paper we intend to analyze the online reviews available for books and extract book-features from the reviews using human intelligence. We have proposed a technique to categorize the features of books from the reviews of the customers. The extracted features may help in deciding the books to be recommended for readers. The ultimate goal of the work is to fulfil the requirement of the user and provide them their desired books. Thus, we have evaluated our categorization method by users themselves, and surveyed qualified persons for the concerned books. The survey results show high precision of the features categorized which clearly indicates that proposed method is very useful and appealing. The proposed technique may help in recommending the best books for concerned people and may also be generalized to recommend any product to the users.

  4. 77 FR 2760 - Proposed Information Collection Request (ICR) for the Mining Voice in the Workplace Survey...

    Science.gov (United States)

    2012-01-19

    ... understanding of those rights, and their ability to exercise these rights without fear of discrimination or... that reflects mining community cultures and practices. Thus, DOL is performing a pilot study...

  5. IMPROVEMENT EVALUATION ON CERAMIC ROOF EXTRACTION USING WORLDVIEW-2 IMAGERY AND GEOGRAPHIC DATA MINING APPROACH

    Directory of Open Access Journals (Sweden)

    V. S. Brum-Bastos

    2016-06-01

    Full Text Available Advances in geotechnologies and in remote sensing have improved analysis of urban environments. The new sensors are increasingly suited to urban studies, due to the enhancement in spatial, spectral and radiometric resolutions. Urban environments present high heterogeneity, which cannot be tackled using pixel–based approaches on high resolution images. Geographic Object–Based Image Analysis (GEOBIA has been consolidated as a methodology for urban land use and cover monitoring; however, classification of high resolution images is still troublesome. This study aims to assess the improvement on ceramic roof classification using WorldView-2 images due to the increase of 4 new bands besides the standard “Blue-Green-Red-Near Infrared” bands. Our methodology combines GEOBIA, C4.5 classification tree algorithm, Monte Carlo simulation and statistical tests for classification accuracy. Two samples groups were considered: 1 eight multispectral and panchromatic bands, and 2 four multispectral and panchromatic bands, representing previous high-resolution sensors. The C4.5 algorithm generates a decision tree that can be used for classification; smaller decision trees are closer to the semantic networks produced by experts on GEOBIA, while bigger trees, are not straightforward to implement manually, but are more accurate. The choice for a big or small tree relies on the user’s skills to implement it. This study aims to determine for what kind of user the addition of the 4 new bands might be beneficial: 1 the common user (smaller trees or 2 a more skilled user with coding and/or data mining abilities (bigger trees. In overall the classification was improved by the addition of the four new bands for both types of users.

  6. Improvement Evaluation on Ceramic Roof Extraction Using WORLDVIEW-2 Imagery and Geographic Data Mining Approach

    Science.gov (United States)

    Brum-Bastos, V. S.; Ribeiro, B. M. G.; Pinho, C. M. D.; Korting, T. S.; Fonseca, L. M. G.

    2016-06-01

    Advances in geotechnologies and in remote sensing have improved analysis of urban environments. The new sensors are increasingly suited to urban studies, due to the enhancement in spatial, spectral and radiometric resolutions. Urban environments present high heterogeneity, which cannot be tackled using pixel-based approaches on high resolution images. Geographic Object-Based Image Analysis (GEOBIA) has been consolidated as a methodology for urban land use and cover monitoring; however, classification of high resolution images is still troublesome. This study aims to assess the improvement on ceramic roof classification using WorldView-2 images due to the increase of 4 new bands besides the standard "Blue-Green-Red-Near Infrared" bands. Our methodology combines GEOBIA, C4.5 classification tree algorithm, Monte Carlo simulation and statistical tests for classification accuracy. Two samples groups were considered: 1) eight multispectral and panchromatic bands, and 2) four multispectral and panchromatic bands, representing previous high-resolution sensors. The C4.5 algorithm generates a decision tree that can be used for classification; smaller decision trees are closer to the semantic networks produced by experts on GEOBIA, while bigger trees, are not straightforward to implement manually, but are more accurate. The choice for a big or small tree relies on the user's skills to implement it. This study aims to determine for what kind of user the addition of the 4 new bands might be beneficial: 1) the common user (smaller trees) or 2) a more skilled user with coding and/or data mining abilities (bigger trees). In overall the classification was improved by the addition of the four new bands for both types of users.

  7. Radiological evaluation near three old mines of uranium extraction in the department of Creuse - year 2007; Evaluation radiologique aux abords de trois anciennes mines d'extraction d'uranium du departement de la Creuse - annee 2007

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2007-07-01

    The observations made for three sites of 'Chaumaillat, Ribiere and Grands Champs', demonstrate the existence of an atypical radiological situation which seems marked by the past activities of the mining. If the geochemical context can sometimes be at the origin of abnormalities in sediments and muds, the regional industrial context, conjugated to the strong measured values of uranium, incites us to privilege a human origin to explain these abnormalities. The presence of almost pure uraniums is the result supposed by the past activities of ore treatment on site (lixiviation) to extract the raw material from it (yellow cake) used for the manufacturing of the nuclear fuel. However, this observation on the site of 'Grands Champs' is surprising considering the absence of treatment activity in situ declared by the operator and the absence of residues storage.Given the accessibility of these sites to the public and considering the stop of any device of surveillance, a follow-up study seems necessary to estimate the importance of the radiological abnormalities and their persistent impact on the environment. (N.C.)

  8. 流域不透水面及其变化信息提取%Extracting Impervious Surface and Its Change Information Using Satellite Remote Sensing Data

    Institute of Scientific and Technical Information of China (English)

    马雪梅; 李希峰

    2008-01-01

    Impervious surface is one of the important parameters of valley water circular simulation, scientific estimation for which has significant and practical value for the urban water quantity and process simulation, diffuse pollution estimating and the forecast of climate changes. The objective of this research is to get the information of impervious surface and its dynamic change. Through the computer-assisted field method, the technologies of decision tree and data mining were applied to withdraw the impervious surface information in research region by the Landsat TM data in 1988, 1994 and 2002. The results suggested that the accuracy of impervious surface information extraction in the study area arrived above 94.4% in 2002 image. On this basis, the mixed method was used to extract the location and the types of the impervious surface change. The overall accuracy of monitoring reached 89%, which meets the demand of the hydrological models.

  9. Automatic Data Extraction from Websites for Generating Aquatic Product Market Information

    Institute of Scientific and Technical Information of China (English)

    YUAN Hong-chun; CHEN Ying; SUN Yue-fu

    2006-01-01

    The massive web-based information resources have led to an increasing demand for effective automatic retrieval of target information for web applications. This paper introduces a web-based data extraction tool that deploys various algorithms to locate, extract and filter tabular data from HTML pages and to transform them into new web-based representations. The tool has been applied in an aquaculture web application platform for extracting and generating aquatic product market information.Results prove that this tool is very effective in extracting the required data from web pages.

  10. Semantic information extracting system for classification of radiological reports in radiology information system (RIS)

    Science.gov (United States)

    Shi, Liehang; Ling, Tonghui; Zhang, Jianguo

    2016-03-01

    Radiologists currently use a variety of terminologies and standards in most hospitals in China, and even there are multiple terminologies being used for different sections in one department. In this presentation, we introduce a medical semantic comprehension system (MedSCS) to extract semantic information about clinical findings and conclusion from free text radiology reports so that the reports can be classified correctly based on medical terms indexing standards such as Radlex or SONMED-CT. Our system (MedSCS) is based on both rule-based methods and statistics-based methods which improve the performance and the scalability of MedSCS. In order to evaluate the over all of the system and measure the accuracy of the outcomes, we developed computation methods to calculate the parameters of precision rate, recall rate, F-score and exact confidence interval.

  11. Extraction of Rare Earth Mining Areas Using Objects-oriented Classification Approach Based on Texture Characteristics%基于纹理的面向对象分类的稀土矿开采地信息提取

    Institute of Scientific and Technical Information of China (English)

    彭燕; 何国金; 曹辉

    2013-01-01

    原地浸矿法和池浸法/原地堆浸法(非原地浸矿法)是江西省稀土矿开采常用的方式.以江西省定南县为研究区,根据稀土矿开采方式和特点,采用面向对象分类的方法,结合纹理信息、面积大小及上下文关系等特征,对研究区2010年的ALOS影像进行稀土矿开采地的信息提取,有效区分了原地浸矿法和非原地浸矿法开采区,且两者的总体精度高达85%.并采用回溯法提取了2001年的稀土矿开采地的信息.最后根据稀土矿区分布图,对稀土矿开采现状以及2001-2010年的十年开采变化情况进行了分析.该研究可为该区域及相关矿产资源开采区的环境遥感监测提供数据支持和技术借鉴.%In-situ leaching method and non in-situ leaching method including pond leaching method and in-situ heap leaching method are common exploitation ways of rare earth ore in Jiangxi Province.By choosing one county in Jiangxi Province as a study region,an objects-oriented classification approach is applied,combining with texture information,object size and context features,to extract rare earth mining areas information from 2010's ALOS fusion image of the study.The result shows that this method can distinguish between the in-situ leaching mining areas and non in-situ leaching mining areas effectively according to the difference of rare earth ore exploitation methods.The effectiveness of the information extraction approach was verified using confusion matrix and field survey.As a result,both verification methods showed the overall classification accuracies are as high as 85%.Furthermore,rare earth mining areas information from 2001's SPOT fusion image was extracted by backtracking method,based on above information extraction results of the year 2010.Finally,according to the rare earth mining areas distribution maps,the current explotation status of rare earth mine and exploitation changes within ten years period,ranging from 2001 to 2010,were

  12. USE OF SEQUENTIAL EXTRACTION TO EVALUATE THE HEAVY METALS IN MINING WASTES. (R825549C016)

    Science.gov (United States)

    The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...

  13. Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

    NARCIS (Netherlands)

    Habib, Mena B.; Keulen, van Maurice

    2011-01-01

    Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration meth

  14. Semantic Preview Benefit in English: Individual Differences in the Extraction and Use of Parafoveal Semantic Information

    Science.gov (United States)

    Veldre, Aaron; Andrews, Sally

    2016-01-01

    Although there is robust evidence that skilled readers of English extract and use orthographic and phonological information from the parafovea to facilitate word identification, semantic preview benefits have been elusive. We sought to establish whether individual differences in the extraction and/or use of parafoveal semantic information could…

  15. An Effective Approach to Biomedical Information Extraction with Limited Training Data

    Science.gov (United States)

    Jonnalagadda, Siddhartha

    2011-01-01

    In the current millennium, extensive use of computers and the internet caused an exponential increase in information. Few research areas are as important as information extraction, which primarily involves extracting concepts and the relations between them from free text. Limitations in the size of training data, lack of lexicons and lack of…

  16. Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

    NARCIS (Netherlands)

    Habib, Mena Badieh; van Keulen, Maurice

    2011-01-01

    Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration meth

  17. Towards an information extraction and knowledge formation framework based on Shannon entropy

    Directory of Open Access Journals (Sweden)

    Iliescu Dragoș

    2017-01-01

    Full Text Available Information quantity subject is approached in this paperwork, considering the specific domain of nonconforming product management as information source. This work represents a case study. Raw data were gathered from a heavy industrial works company, information extraction and knowledge formation being considered herein. Involved method for information quantity estimation is based on Shannon entropy formula. Information and entropy spectrum are decomposed and analysed for extraction of specific information and knowledge-that formation. The result of the entropy analysis point out the information needed to be acquired by the involved organisation, this being presented as a specific knowledge type.

  18. Text Mining to inform construction of Earth and Environmental Science Ontologies

    Science.gov (United States)

    Schildhauer, M.; Adams, B.; Rebich Hespanha, S.

    2013-12-01

    There is a clear need for better semantic representation of Earth and environmental concepts, to facilitate more effective discovery and re-use of information resources relevant to scientists doing integrative research. In order to develop general-purpose Earth and environmental science ontologies, however, it is necessary to represent concepts and relationships that span usage across multiple disciplines and scientific specialties. Traditional knowledge modeling through ontologies utilizes expert knowledge but inevitably favors the particular perspectives of the ontology engineers, as well as the domain experts who interacted with them. This often leads to ontologies that lack robust coverage of synonymy, while also missing important relationships among concepts that can be extremely useful for working scientists to be aware of. In this presentation we will discuss methods we have developed that utilize statistical topic modeling on a large corpus of Earth and environmental science articles, to expand coverage and disclose relationships among concepts in the Earth sciences. For our work we collected a corpus of over 121,000 abstracts from many of the top Earth and environmental science journals. We performed latent Dirichlet allocation topic modeling on this corpus to discover a set of latent topics, which consist of terms that commonly co-occur in abstracts. We match terms in the topics to concept labels in existing ontologies to reveal gaps, and we examine which terms are commonly associated in natural language discourse, to identify relationships that are important to formally model in ontologies. Our text mining methodology uncovers significant gaps in the content of some popular existing ontologies, and we show how, through a workflow involving human interpretation of topic models, we can bootstrap ontologies to have much better coverage and richer semantics. Because we base our methods directly on what working scientists are communicating about their

  19. Data Mining and Statistics for Decision Making

    CERN Document Server

    Tufféry, Stéphane

    2011-01-01

    Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge. Data mining is usually associated with a business or an organization's need to identify trends and profiles, allowing, for example, retailers to discover patterns on which to base marketing objectives. This book looks at both classical and recent techniques of data mining, such as clustering, discriminant analysis, logistic regression, generalized lin

  20. Opinion mining feature-level using Naive Bayes and feature extraction based analysis dependencies

    Science.gov (United States)

    Sanda, Regi; Baizal, Z. K. Abdurahman; Nhita, Fhira

    2015-12-01

    Development of internet and technology, has major impact and providing new business called e-commerce. Many e-commerce sites that provide convenience in transaction, and consumers can also provide reviews or opinions on products that purchased. These opinions can be used by consumers and producers. Consumers to know the advantages and disadvantages of particular feature of the product. Procuders can analyse own strengths and weaknesses as well as it's competitors products. Many opinions need a method that the reader can know the point of whole opinion. The idea emerged from review summarization that summarizes the overall opinion based on sentiment and features contain. In this study, the domain that become the main focus is about the digital camera. This research consisted of four steps 1) giving the knowledge to the system to recognize the semantic orientation of an opinion 2) indentify the features of product 3) indentify whether the opinion gives a positive or negative 4) summarizing the result. In this research discussed the methods such as Naï;ve Bayes for sentiment classification, and feature extraction algorithm based on Dependencies Analysis, which is one of the tools in Natural Language Processing (NLP) and knowledge based dictionary which is useful for handling implicit features. The end result of research is a summary that contains a bunch of reviews from consumers on the features and sentiment. With proposed method, accuration for sentiment classification giving 81.2 % for positive test data, 80.2 % for negative test data, and accuration for feature extraction reach 90.3 %.

  1. Application of Text Mining to Extract Hotel Attributes and Construct Perceptual Map of Five Star Hotels from Online Review: Study of Jakarta and Singapore Five-Star Hotels

    Directory of Open Access Journals (Sweden)

    Arga Hananto

    2015-12-01

    Full Text Available The use of post-purchase online consumer review in hotel attributes study was still scarce in the literature. Arguably, post purchase online review data would gain more accurate attributes thatconsumers actually consider in their purchase decision. This study aims to extract attributes from two samples of five-star hotel reviews (Jakarta and Singapore with text mining methodology. In addition,this study also aims to describe positioning of five-star hotels in Jakarta and Singapore based on the extracted attributes using Correspondence Analysis. This study finds that reviewers of five star hotels in both cities mentioned similar attributes such as service, staff, club, location, pool and food. Attributes derived from text mining seem to be viable input to build fairly accurate positioning map of hotels. This study has demonstrated the viability of online review as a source of data for hotel attribute and positioning studies.

  2. An Intelligent Approach For Mining Frequent Spatial Objects In Geographic Information System

    Directory of Open Access Journals (Sweden)

    Animesh Tripathy

    2010-11-01

    Full Text Available Spatial Data Mining is based on correlation of spatial objects in space. Mining frequent pattern fromspatial databases systems has always remained a challenge for researchers. In the light of the first law ofgeography “everything is related to everything else but nearby things is more related than distant things”suggests that values taken from samples of spatial data near to each other tend to be more similar thanthose taken farther apart. This tendency is termed spatial autocorrelation or spatial dependence. It’snatural that most spatial data are not independent, they have high autocorrelation. In this paper, wepropose an enhancement of existing mining algorithm for efficiently mining frequent patterns for spatialobjects occurring in space such as a city is located near a river. The frequency of each spatial object inrelation to other object tends to determine multiple occurrence of the same object. We further enhancethe proposed approach by using a numerical method. This method uses a tree structure basedmethodology for mining frequent patterns considering the frequency of each object stored at each node ofthe tree. Experimental results suggest significant improvement in finding valid frequent patterns overexisting methods.

  3. Trends in cyanide solution concentrations and mine operations at gold mines in Nevada and their potential effects on cyanide-related mortality of vertebrates

    Data.gov (United States)

    US Fish and Wildlife Service, Department of the Interior — Information on trends in practices at gold mines in Nevada using CN extraction technology and its relation to mortality of vertebrates, especially birds, was needed...

  4. Automated Personal Email Organizer with Information Management and Text Mining Application

    Directory of Open Access Journals (Sweden)

    Dr. Sanjay Tanwani

    2012-04-01

    Full Text Available Email is one of the most ubiquitous applications used regularly by millions of people worldwide. Professionals have to manage hundreds of emails on a daily basis, sometimes leading to overload and stress. Lots of emails are unanswered and sometimes remain unattended as the time pass by. Managing every single email takes a lot of effort especially when the size of email transaction log is very large. This work is focused on creating better ways of automatically organizing personal email messages. In this paper, a methodology for automated event information extraction from incoming email messages is proposed. The proposed methodology/algorithm and the software based on the above, has helped to improve the email management leading to reduction in the stress and timely response of emails.

  5. Mining and environment

    Energy Technology Data Exchange (ETDEWEB)

    Kisgyorgy, S.

    1986-01-01

    The realization of new mining projects should be preceded by detailed studies on the impact of mining activities on the environment. For defining the conditions of environmental protection and for making proper financial plans the preparation of an information system is needed. The possible social effects of the mining investments have to be estimated, first of all from the points of view of waste disposal, mining hydrology, subsidence due to underground mining etc.

  6. Construction of an index of information from clinical practice in Radiology and Imaging Diagnosis based on text mining and thesaurus

    Directory of Open Access Journals (Sweden)

    Paulo Roberto Barbosa Serapiao

    2013-09-01

    Full Text Available Objective To construct a Portuguese language index of information on the practice of diagnostic radiology in order to improve the standardization of the medical language and terminology. Materials and Methods A total of 61,461 definitive reports were collected from the database of the Radiology Information System at Hospital das Clínicas – Faculdade de Medicina de Ribeirão Preto (RIS/HCFMRP as follows: 30,000 chest x-ray reports; 27,000 mammography reports; and 4,461 thyroid ultrasonography reports. The text mining technique was applied for the selection of terms, and the ANSI/NISO Z39.19-2005 standard was utilized to construct the index based on a thesaurus structure. The system was created in *html. Results The text mining resulted in a set of 358,236 (n = 100% words. Out of this total, 76,347 (n = 21% terms were selected to form the index. Such terms refer to anatomical pathology description, imaging techniques, equipment, type of study and some other composite terms. The index system was developed with 78,538 *html web pages. Conclusion The utilization of text mining on a radiological reports database has allowed the construction of a lexical system in Portuguese language consistent with the clinical practice in Radiology.

  7. Automated information extraction of key trial design elements from clinical trial publications.

    Science.gov (United States)

    de Bruijn, Berry; Carini, Simona; Kiritchenko, Svetlana; Martin, Joel; Sim, Ida

    2008-11-06

    Clinical trials are one of the most valuable sources of scientific evidence for improving the practice of medicine. The Trial Bank project aims to improve structured access to trial findings by including formalized trial information into a knowledge base. Manually extracting trial information from published articles is costly, but automated information extraction techniques can assist. The current study highlights a single architecture to extract a wide array of information elements from full-text publications of randomized clinical trials (RCTs). This architecture combines a text classifier with a weak regular expression matcher. We tested this two-stage architecture on 88 RCT reports from 5 leading medical journals, extracting 23 elements of key trial information such as eligibility rules, sample size, intervention, and outcome names. Results prove this to be a promising avenue to help critical appraisers, systematic reviewers, and curators quickly identify key information elements in published RCT articles.

  8. Extraction of information of targets based on frame buffer

    Science.gov (United States)

    Han, Litao; Kong, Qiaoli; Zhao, Xiangwei

    2008-10-01

    In all ways of perception, vision is the main channel of getting environmental information for intelligent virtual agent (IVA). Reality and real-time computation of behavior simulation of intelligent objects in interactive virtual environment are required. This paper proposes a new method of getting environmental information. Firstly visual images are generated by setting a second view port in the location of viewpoint of IVA, and then the target location, distance, azimuth, and other basic geometric information and semantic information can be acquired based on the images. Experiments show that the method gives full play to the performance of computer graphic hardware with simple process and higher efficiency.

  9. Extracting local information from crowds through betting markets

    Science.gov (United States)

    Weijs, Steven

    2015-04-01

    In this research, a set-up is considered in which users can bet against a forecasting agency to challenge their probabilistic forecasts. From an information theory standpoint, a reward structure is considered that either provides the forecasting agency with better information, paying the successful providers of information for their winning bets, or funds excellent forecasting agencies through users that think they know better. Especially for local forecasts, the approach may help to diagnose model biases and to identify local predictive information that can be incorporated in the models. The challenges and opportunities for implementing such a system in practice are also discussed.

  10. Extracting Coherent Information from Noise Based Correlation Processing

    Science.gov (United States)

    2015-09-30

    LONG-TERM GOALS The goal of this research is to establish methodologies to utilize ambient noise in the ocean and to determine what scenarios...None PUBLICATIONS [1] “ Monitoring deep-ocean temperatures using acoustic ambinet noise,”K. W. Woolfe, S. Lani, K.G. Sabra, W. A. Kuperman...Geophys. Res. Lett., 42,2878–2884, doi:10.1002/2015GL063438 (2015). [2] “Optimized extraction of coherent arrivals from ambient noise correlations in

  11. A Complex Use of the Materials Extracted from an Open-Cast Lignite Mine

    Science.gov (United States)

    Buryan, Petr; Bučko, Zdeněk; Mika, Petr

    2014-12-01

    The company Sokolovská uhelná, was the largest producer of city gas in the Czech Republic. After its substitution by natural gas the gasification technology became the basis of the production of electricity in the combine cycle power plant with total output 400 MW. For the possibility of gasification of liquid by- -products forming during the coal gasification a entrained-flow gasifier capable to process also alternative liquid fuels has been in installed. The concentrated waste gas with these sulphur compounds is conducted to the desulphurisation where the highly desired, pure, 96 % H2SO4 is produced. Briquettable brown coal is crushed, milled and dried and then it is passed into briquetting presses where briquettes, used mainly as a fuel in households, are pressed without binder in the punch under the pressure of 175 MPa. Fine brown coal dust (multidust) is commercially used for heat production in pulverized-coal burners. It forms not only during coal drying after separation on electrostatic separators, but it is also acquired by milling of dried coal in a vibratory bar mill. Slag from boilers of classical power plant, cinder form generators and ashes deposited at the dump are dehydrated and they are used as a quality bedding material during construction of communications in the mines of SUAS. Fly ash is used in building industry for partial substitution of cement in concrete. Flue gases after separation of fly ash are desulphurized by wet limestone method, where the main product is gypsum used, among others, in the building industry. Expanded clays from overburdens of coal seams, that are raw material for the production of "Liapor" artificial aggregate, are used heavily. This artificial aggregate is characterized by outstanding thermal and acoustic insulating properties. Przedsiębiorstwo Sokolovska uhelna jest największym producentem gazu miejskiego w Republice Czeskiej. Po jego zastąpieniu przez gaz ziemny, technologia gazyfikacji stała się podstawą do

  12. 挖掘专利知识实现关键词自动抽取%Mining Patent Knowledge for Automatic Keyword Extraction

    Institute of Scientific and Technical Information of China (English)

    陈忆群; 周如旗; 朱蔚恒; 李梦婷; 印鉴

    2016-01-01

    expression and professional authority .T his paper uses patent data set as the external knowledge repository serves for keyword extraction .An algorithm is designed to construct the background knowledge repository based on patent data set , also a method for automatic keyword extraction with novel word features is provided . This paper discusses the characters of patent data ,mines the relation between different patent files to construct background knowledge repository for target document , and finally achieves keyword extraction . The related patent files of target document are used to construct background knowledge repository . The information of patent inventors ,assignees ,citations and classification are used to mining the hidden knowledge and relationship between different patent files .And the related knowledge is imported to extend the background knowledge repository . Novel word features are derived according to the different background knowledge supplied by patent data .The word features reflecting the document’s background knowledge offer valuable indications on individual words’ importance in the target document .The keyword extraction problem can then be regarded as a classification problem and the support vector machine (SVM) is used to extract the keywords .Experiments have been done using patent data set and open data set . Experimental results have proved that using these novel word features ,the novel approach can achieve superior performance in keyword extraction to other state‐of‐the‐art approaches .

  13. STUDY ON PHYTO-EXTRACTION BALANCE OF ZN, CD AND PB FROM MINE-WASTE POLLUTED SOILS BY USING FESTUCA ARUNDINACEA AND LOLIUM PERENNE SPECIES

    Directory of Open Access Journals (Sweden)

    B. LIXANDRU

    2013-07-01

    Full Text Available Through the cultivation of tall fescue (Festuca arundinacea and of perennial ryegrass for two years on a chernozem type of soil, in the Banat's plain area we investigated the phyto-extraction potential of Zn, Cd and Pb. In the experimental plot it has been incorporated a quantity of 20 kg of mine-waste per square meter, in a mass ratio of 1:2,5. The mine-waste polluting "contribution" was of 1209 mg Zn / kg d.s., 4.70 mg Cd / kg d.s. and 188.2 mg Pb / kg d.s. The metals content in the soil was determined at the two moments of biomass harvesting, and through balance calculations we could establish the phyto-extraction efficiency of the two foragegrasses species. The obtained results indicate that Festuca arundinacea has an average phyto-extraction yield of 50% for Zn and Cd in the soil; in the case of an ionic excess of 3,5 to 4 times, the phyto-extraction efficiency is reduced, more obvious in the case of Pb (lead ions. The species Lolium perenne registers a yield of almost 92% in the process of phyto-extraction of Zn. The yield values for Cd si Pb are lower, but comparable with the control plot. Unlike Festuca arundinacea, the Lollium perenne species tolerates better the Cd and Pb ionic excess.

  14. Coal mine subsidence area extraction from high-resolution remote sensing imagery based on object-oriented classification method%面向对象的高分辨率影像采煤塌陷地提取

    Institute of Scientific and Technical Information of China (English)

    李晓霞; 汪云甲

    2011-01-01

    Coal mine activities have caused a series of geological and environmental problems, such as ground surface subsidence, et al.Compared with conventional monitoring techniques as leveling and GPS,the remote sensing technique can dynamically monitor the mine subsidence with wider coverage and higher efficiency.The object-oriented classification method, which can make better use of the texture and geometry shape information of the remote sensing imagery, is more and more utilized in high resolution image classification.Based on the characteristics of mine subsidence area,how to recognize the mine subsidence area from remote sensing imagery is studied, and the automatic extraction rules for object-oriented classification method are proposed.The Pansan mine of Huainan Mining Group in Anhui province, China, is selected as an experimental area, the SPOT 5 images are used to extract mine subsidence area with the proposed method, all of the images are processed using ERDAS IMAGINE9.2, ENVI4.7 and ENVI4.4 ZOOM.2.322 6 km2 mine subsidence area are extracted, compared with field survey data,the result is very convincing.%煤炭开采引起了塌陷等一系列的地质环境问题,与常规监测方法相比,遥感技术可以实现大范围、高效率、周期性的动态监测.在遥感影像分类方法中,面向对象的遥感影像分类方法能更好地利用高分辨率遥感影像中丰富的纹理和几何结构信息.针对煤炭开采导致的地表塌陷地的特点,在归纳整理遥感影像中塌陷地判识准则的基础上,重点探讨了面向对象的遥感影像分类方法中塌陷地的自动提取规则.综合利用ERDAS IMAGINE9.2、ENVI4.7和ENVI4.4 ZOOM进行数据处理,以安徽省淮南矿务集团潘三矿区为实验区,用该方法利用SPOT5影像进行了塌陷地信息提取实验,结果证明,面向对象分类方法能有效地从高分辨率遥感影像中自动提取塌陷地相关信息.

  15. Data mining in radiology.

    Science.gov (United States)

    Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish

    2014-04-01

    Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining.

  16. Data mining in radiology

    Directory of Open Access Journals (Sweden)

    Amit T Kharat

    2014-01-01

    Full Text Available Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining.

  17. Data mining in radiology

    Science.gov (United States)

    Kharat, Amit T; Singh, Amarjit; Kulkarni, Vilas M; Shah, Digish

    2014-01-01

    Data mining facilitates the study of radiology data in various dimensions. It converts large patient image and text datasets into useful information that helps in improving patient care and provides informative reports. Data mining technology analyzes data within the Radiology Information System and Hospital Information System using specialized software which assesses relationships and agreement in available information. By using similar data analysis tools, radiologists can make informed decisions and predict the future outcome of a particular imaging finding. Data, information and knowledge are the components of data mining. Classes, Clusters, Associations, Sequential patterns, Classification, Prediction and Decision tree are the various types of data mining. Data mining has the potential to make delivery of health care affordable and ensure that the best imaging practices are followed. It is a tool for academic research. Data mining is considered to be ethically neutral, however concerns regarding privacy and legality exists which need to be addressed to ensure success of data mining. PMID:25024513

  18. Building a Bridge or Digging a Pipeline? Clinical Data Mining in Evidence-Informed Knowledge Building

    Science.gov (United States)

    Epstein, Irwin

    2015-01-01

    Challenging the "bridge metaphor" theme of this conference, this article contends that current practice-research integration strategies are more like research-to-practice "pipelines." The purpose of this article is to demonstrate the potential of clinical data-mining studies conducted by practitioners, practitioner-oriented PhD…

  19. Information and diagnostic tools of objective control as means to improve performance of mining machines

    Science.gov (United States)

    Zvonarev, I. E.; Shishlyannikov, D. I.

    2017-02-01

    The paper justifies the relevance of developing and implementing automated onboard systems for operation data and maintenance recording in heading-and-winning machines. The analysis of advantages and disadvantages of existing automated onboard systems for operation data and maintenance recording in heading-and-winning machines for potassium mines are presented. The basic technical requirements for the design, operating algorithms and functions of recording systems of mining machines for potassium mines are formulated. A method of controlling operating parameters is presented; the concept of the onboard automated recording system for the Ural heading-and-winning machine is outlined. The results of experimental studies of variations in loading of the Ural-20R miner’s operating member drives, using the VATUR portable measuring complex, are given. It is proved that existing means of objective control of operating parameters of the URAL-20R heading-and-winning machine do not assure its optimal operation. The authors present a technique of analyzing the data provided by parameter recorders that allow increasing efficiency of mechanical complexes by determining numerical values characterizing the technical and technological level of potassium ore production organization. The efficiency assessment criteria for engineering and maintenance departments of mining enterprises are advanced. A technology of continuous automated monitoring of potassium mine’s outburst hazard is described.

  20. Advanced Methods for Image Information Mining System: Evaluation and Enhancement of User Relevance

    OpenAIRE

    Daschiel, Herbert Andreas

    2004-01-01

    Die Notwendigkeit eines effizienten Zugriffs auf grosse Bildbestaende in Datenbanken fuehrte zur Entwicklung von Verfahren der inhaltsbasierten Bildsuche (Image Data Mining). Die Weiterentwicklung dieser Verfahren haengt - so wie in anderen Forschungsgebieten auch - davon ab, ob sich die Methoden der Bildabfrage als auch des Bildverstehens quantitativ bewerten lassen. Bislang wurde der Thematik der Evaluierung von Systemen zur inhaltsorientierten Bildsuche nur sehr wenig Aufmerksamkeit gesche...

  1. A Study on Environmental Research Trends Using Text-Mining Method - Focus on Spatial information and ICT -

    Science.gov (United States)

    Lee, M. J.; Oh, K. Y.; Joung-ho, L.

    2016-12-01

    Recently there are many research about analysing the interaction between entities by text-mining analysis in various fields. In this paper, we aimed to quantitatively analyse research-trends in the area of environmental research relating either spatial information or ICT (Information and Communications Technology) by Text-mining analysis. To do this, we applied low-dimensional embedding method, clustering analysis, and association rule to find meaningful associative patterns of key words frequently appeared in the articles. As the authors suppose that KCI (Korea Citation Index) articles reflect academic demands, total 1228 KCI articles that have been published from 1996 to 2015 were reviewed and analysed by Text-mining method. First, we derived KCI articles from NDSL(National Discovery for Science Leaders) site. And then we pre-processed their key-words elected from abstract and then classified those in separable sectors. We investigated the appearance rates and association rule of key-words for articles in the two fields: spatial-information and ICT. In order to detect historic trends, analysis was conducted separately for the four periods: 1996-2000, 2001-2005, 2006-2010, 2011-2015. These analysis were conducted with the usage of R-software. As a result, we conformed that environmental research relating spatial information mainly focused upon such fields as `GIS(35%)', `Remote-Sensing(25%)', `environmental theme map(15.7%)'. Next, `ICT technology(23.6%)', `ICT service(5.4%)', `mobile(24%)', `big data(10%)', `AI(7%)' are primarily emerging from environmental research relating ICT. Thus, from the analysis results, this paper asserts that research trends and academic progresses are well-structured to review recent spatial information and ICT technology and the outcomes of the analysis can be an adequate guidelines to establish environment policies and strategies. KEY WORDS: Big data, Test-mining, Environmental research, Spatial-information, ICT Acknowledgements: The

  2. Advanced remote sensing terrestrial information extraction and applications

    CERN Document Server

    Liang, Shunlin; Wang, Jindi

    2012-01-01

    Advanced Remote Sensing is an application-based reference that provides a single source of mathematical concepts necessary for remote sensing data gathering and assimilation. It presents state-of-the-art techniques for estimating land surface variables from a variety of data types, including optical sensors such as RADAR and LIDAR. Scientists in a number of different fields including geography, geology, atmospheric science, environmental science, planetary science and ecology will have access to critically-important data extraction techniques and their virtually unlimited application

  3. Spoken Language Understanding Systems for Extracting Semantic Information from Speech

    CERN Document Server

    Tur, Gokhan

    2011-01-01

    Spoken language understanding (SLU) is an emerging field in between speech and language processing, investigating human/ machine and human/ human communication by leveraging technologies from signal processing, pattern recognition, machine learning and artificial intelligence. SLU systems are designed to extract the meaning from speech utterances and its applications are vast, from voice search in mobile devices to meeting summarization, attracting interest from both commercial and academic sectors. Both human/machine and human/human communications can benefit from the application of SLU, usin

  4. Extraction of information about periodic orbits from scattering functions

    CERN Document Server

    Bütikofer, T; Seligman, T H; Bütikofer, Thomas; Jung, Christof; Seligman, Thomas H.

    1999-01-01

    As a contribution to the inverse scattering problem for classical chaotic systems, we show that one can select sequences of intervals of continuity, each of which yields the information about period, eigenvalue and symmetry of one unstable periodic orbit.

  5. Scalable exploratory data mining of distributed geoscientific data

    Energy Technology Data Exchange (ETDEWEB)

    Shek, E.C.; Muntz, R.R.; Mesrobian, E.; Ng, K. [Univ. of California, Los Angeles, CA (United States)

    1996-12-31

    Geoscience studies produce data from various observations, experiments, and simulations at an enormous rate. Exploratory data mining extracts {open_quotes}content information{close_quotes} from massive geoscientific datasets to extract knowledge and provide a compact summary of the dataset. In this paper, we discuss how database query processing and distributed object management techniques can be used to facilitate geoscientific data mining and analysis. Some special requirements of large scale geoscientific data mining that are addressed include geoscientific data modeling, parallel query processing, and heterogeneous distributed data access.

  6. Sequential Pattern Mining Using Formal Language Tools

    Directory of Open Access Journals (Sweden)

    R. S. Jadon

    2012-09-01

    Full Text Available In present scenario almost every system and working is computerized and hence all information and data are being stored in Computers. Huge collections of data are emerging. Retrieval of untouched, hidden and important information from this huge data is quite tedious work. Data Mining is a great technological solution which extracts untouched, hidden and important information from vast databases to investigate noteworthy knowledge in the data warehouse. An important problem in data mining is to discover patterns in various fields like medical science, world wide web, telecommunication etc. In the field of Data Mining, Sequential pattern mining is one of the method in which we retrieve hidden pattern linked with instant or other sequences. In sequential pattern mining we extract those sequential patterns whose support count are greater than or equal to given minimum support threshold value. In current scenario users are interested in only specific and interesting pattern instead of entire probable sequential pattern. To control the exploration space users can use many heuristics which can be represented as constraints. Many algorithms have been developed in the fields of constraint mining which generate patterns as per user expectation. In the present work we will be exploring and enhancing the regular expression constraints .Regular expression is one of the constraint and number of algorithm developed for sequential pattern mining which uses regular expression as a constraint. Some constraints are neither regular nor context free like cross-serial pattern anbmcndm used in Swiss German Data. We cannot construct equivalent deterministic finite automata (DFA or Push down automata (PDA for such type of patterns. We have proposed a new algorithm PMFLT (Pattern Mining using Formal Language Tools for sequential pattern mining using formal language tools as constraints. The proposed algorithm finds only user specific frequent sequence in efficient

  7. NEW METHOD OF EXTRACTING WEAK FAILURE INFORMATION IN GEARBOX BY COMPLEX WAVELET DENOISING

    Institute of Scientific and Technical Information of China (English)

    CHEN Zhixin; XU Jinwu; YANG Debin

    2008-01-01

    Because the extract of the weak failure information is always the difficulty and focus of fault detection. Aiming for specific statistical properties of complex wavelet coefficients of gearbox vibration signals, a new signal-denoising method which uses local adaptive algorithm based on dual-tree complex wavelet transform (DT-CWT) is introduced to extract weak failure information in gear, especially to extract impulse components. By taking into account the non-Gaussian probability distribution and the statistical dependencies among wavelet coefficients of some signals, and by taking the advantage of near shift-invariance of DT-CWT, the higher signal-to-noise ratio (SNR) than common wavelet denoising methods can be obtained. Experiments of extracting periodic impulses in gearbox vibration signals indicate that the method can extract incipient fault feature and hidden information from heavy noise, and it has an excellent effect on identifying weak feature signals in gearbox vibration signals.

  8. The Extraction Model of Paddy Rice Information Based on GF-1 Satellite WFV Images.

    Science.gov (United States)

    Yang, Yan-jun; Huang, Yan; Tian, Qing-jiu; Wang, Lei; Geng, Jun; Yang, Ran-ran

    2015-11-01

    In the present, using the characteristics of paddy rice at different phenophase to identify it by remote sensing images is an efficient way in the information extraction. According to the remarkably properties of paddy rice different from other vegetation, which the surface of paddy fields is with a large number of water in the early stage, NDWI (normalized difference water index) which is used to extract water information can reasonably be applied in the extraction of paddy rice at the early stage of the growth. And using NDWI ratio of two phenophase can expand the difference between paddy rice and other surface features, which is an important part for the extraction of paddy rice with high accuracy. Then using the variation of NDVI (normalized differential vegetation index) in different phenophase can further enhance accuracy of paddy rice information extraction. This study finds that making full advantage of the particularity of paddy rice in different phenophase and combining two indices (NDWI and NDVI) associated with paddy rice can establish a reasonable, accurate and effective extraction model of paddy rice. This is also the main way to improve the accuracy of paddy rice extraction. The present paper takes Lai'an in Anhui Province as the research area, and rice as the research object. It constructs the extraction model of paddy rice information using NDVI and NDWI between tillering stage and heading stage. Then the model was applied to GF1-WFV remote sensing image on July 12, 2013 and August 30, 2013. And it effectively extracted out of paddy rice distribution in Lai'an and carried on the mapping. At last, the result of extraction was verified and evaluated combined with field investigation data in the study area. The result shows that using the extraction model can quickly and accurately obtain the distribution of rice information, and it has the very good universality.

  9. Resource requirements and economics of the coal-mining process: a comparative analysis of mines in selected countries

    Energy Technology Data Exchange (ETDEWEB)

    Astakhov, A.; Gruebler, A.

    1984-06-01

    This report examines the natural resource requirements and economics of the resource extraction process, taking coal-mining activities as an example. Coal was chosen for the study because it is receiving growing attention as the fossile energy resource with the largest potential to contribute to the world's long-term energy supply. The computerized description of the extraction process is stored in the Coal Mines Data Base (CMDB) which was developed within the framework of this study. The data base currently holds information on 70 mines located in different countries. The analytic approach used is the first of its kind to compare resource requirements and economics of coal mines under such a broad range of geological and socioeconomic conditions. A general model of the factors influencing resource inputs and impacts of the coal-mining process is presented. Then for each of the main mining methods (opencast, conventional underground, and hydraulic underground) the principal geological and technological factors influencing the resource requirements, economics, and environmental impacts, as well as the comparative advantages and disadvantages of each mining method, are discussed. For the three main mining methods the resource requirements (including manpower, energy, materials, and land) and the economics (including construction investments and operating costs) are then quantified and their cost structures (i.e. requirements for the different operations at a mine) are examined in detail using data from coal mines in the USA, the USSR, and other selected coal-producing countries (Australia, Austria, and France).

  10. Information extraction from FN plots of tungsten microemitters

    Energy Technology Data Exchange (ETDEWEB)

    Mussa, Khalil O. [Department of Physics, Mu' tah University, Al-Karak (Jordan); Mousa, Marwan S., E-mail: mmousa@mutah.edu.jo [Department of Physics, Mu' tah University, Al-Karak (Jordan); Fischer, Andreas, E-mail: andreas.fischer@physik.tu-chemnitz.de [Institut für Physik, Technische Universität Chemnitz, Chemnitz (Germany)

    2013-09-15

    Tungsten based microemitter tips have been prepared both clean and coated with dielectric materials. For clean tungsten tips, apex radii have been varied ranging from 25 to 500 nm. These tips were manufactured by electrochemical etching a 0.1 mm diameter high purity (99.95%) tungsten wire at the meniscus of two molar NaOH solution. Composite micro-emitters considered here are consisting of a tungsten core coated with different dielectric materials—such as magnesium oxide (MgO), sodium hydroxide (NaOH), tetracyanoethylene (TCNE), and zinc oxide (ZnO). It is worthwhile noting here, that the rather unconventional NaOH coating has shown several interesting properties. Various properties of these emitters were measured including current–voltage (IV) characteristics and the physical shape of the tips. A conventional field emission microscope (FEM) with a tip (cathode)–screen (anode) separation standardized at 10 mm was used to electrically characterize the electron emitters. The system was evacuated down to a base pressure of ∼10{sup −8}mbar when baked at up to ∼180°C overnight. This allowed measurements of typical field electron emission (FE) characteristics, namely the IV characteristics and the emission images on a conductive phosphorus screen (the anode). Mechanical characterization has been performed through a FEI scanning electron microscope (SEM). Within this work, the mentioned experimental results are connected to the theory for analyzing Fowler–Nordheim (FN) plots. We compared and evaluated the data extracted from clean tungsten tips of different radii and determined deviations between the results of different extraction methods applied. In particular, we derived the apex radii of several clean and coated tungsten tips by both SEM imaging and analyzing FN plots. The aim of this analysis is to support the ongoing discussion on recently developed improvements of the theory for analyzing FN plots related to metal field electron emitters, which in

  11. Land and natural resource information and some potential environmental effects of surface mining of coal in the Gillette area, Wyoming

    Science.gov (United States)

    Keefer, William Richard; Hadley, R.F.

    1976-01-01

    Campbell County, along the east margin of the Powder River Basin in northeastern Wyoming, contains more coal than any other county in the United States. The principal deposit is the Wyodak-Anderson coal bed. The bed is 50-100 feet (15-30 meters) thick over large areas, lies less than 200 feet (60 meters) deep in a north-south trending strip nearly 100 miles (161 kilometers) long and 2-3 miles (3-5 kilometers) wide, and contains an estimated 15 billion tons (13.6 billion metric tons) of sub-bituminous, low-sulfur coal that is presently considered to be accessible to surface mining. Extensive mining of this deposit has the potential for causing a variety of environmental impacts and has been a matter of much public concern and debate in recent years. An integrated program of geologic, hydrologic, geochemical, and related studies by the U.S. Geological Survey in central Campbell County provides basic information about the land and its resources, including (1) characteristics of the landscape, (2) properties of rocks and surface materials, (3) depth and thickness of coal, (4) streamflow, (5) depth to ground water, (6) quality of ground water, (7) sediment yield, (8) concentrations of trace elements in soils, rocks, coal, vegetation, and water, and (9) current land use. The data are used to analyze and predict some of the potential environmental effects of surface mining, such as the extent of land disturbance, nature and degree of landscape modification, and disruption of surface-water and ground-water systems. Advance knowledge and understanding of these and other problems are useful in the planning and regulation of future leasing, mining, reclamation, and related activities.

  12. Advanced Extraction of Spatial Information from High Resolution Satellite Data

    Science.gov (United States)

    Pour, T.; Burian, J.; Miřijovský, J.

    2016-06-01

    In this paper authors processed five satellite image of five different Middle-European cities taken by five different sensors. The aim of the paper was to find methods and approaches leading to evaluation and spatial data extraction from areas of interest. For this reason, data were firstly pre-processed using image fusion, mosaicking and segmentation processes. Results going into the next step were two polygon layers; first one representing single objects and the second one representing city blocks. In the second step, polygon layers were classified and exported into Esri shapefile format. Classification was partly hierarchical expert based and partly based on the tool SEaTH used for separability distinction and thresholding. Final results along with visual previews were attached to the original thesis. Results are evaluated visually and statistically in the last part of the paper. In the discussion author described difficulties of working with data of large size, taken by different sensors and different also thematically.

  13. Extraction of Information on the Technical Effect from a Patent Document

    Science.gov (United States)

    Sakai, Hiroyuki; Nonaka, Hirohumi; Masuyama, Shigeru

    We propose a method for extracting information on the technical effect from a patent document. The information on the technical effect extracted by our method is useful for generating patent maps (see e.g., Figure 1.) automatically or analyzing the technical trend from patent documents. Our method extracts expressions containing the information on the technical effect by using frequent expressions and clue expressions effective for extracting them. The frequent expressions and clue expressions are extracted by using statistical information and initial clue expressions automatically. Our method extracts expressions containing the information on the technical effect without predetermined patterns given by hand, and is expected to be applied to other tasks for acquiring expressions that have a particular meaning (e.g., information on the means for solving the problems) not limited to the information on the technical effect. Our method achieves not only high precision (78.0%) but also high recall (77.6%) by acquiring such clue expressions automatically from patent documents.

  14. Swamp Works: A New Approach to Develop Space Mining and Resource Extraction Technologies at the National Aeronautics Space Administration (NASA) Kennedy Space Center (KSC)

    Science.gov (United States)

    Mueller, R. P.; Sibille, L.; Leucht, K.; Smith, J. D.; Townsend, I. I.; Nick, A. J.; Schuler, J. M.

    2015-01-01

    environment and methodology, with associated laboratories that uses lean development methods and creativity-enhancing processes to invent and develop new solutions for space exploration. This paper will discuss the Swamp Works approach to developing space mining and resource extraction systems and the vision of space development it serves. The ultimate goal of the Swamp Works is to expand human civilization into the solar system via the use of local resources utilization. By mining and using the local resources in situ, it is conceivable that one day the logistics supply train from Earth can be eliminated and Earth independence of a space-based community will be enabled.

  15. On Depth Information Extraction from Metal Detector Signals

    NARCIS (Netherlands)

    Schoolderman, A.J.; Wolf, F.J. de; Merlat, L.

    2003-01-01

    Information on the depth of objects detected with the help of a metal detector is useful for safe excavation of these objects in demining operations. Apart from that, depth informatíon may be used in advanced sensor fusion algorithms for a detection system where a metal detector is combíned with eg.

  16. Extracting Conflict-free Information from Multi-labeled Trees

    CERN Document Server

    Deepak, Akshay; McMahon, Michelle M

    2012-01-01

    A multi-labeled tree, or MUL-tree, is a phylogenetic tree where two or more leaves share a label, e.g., a species name. A MUL-tree can imply multiple conflicting phylogenetic relationships for the same set of taxa, but can also contain conflict-free information that is of interest and yet is not obvious. We define the information content of a MUL-tree T as the set of all conflict-free quartet topologies implied by T, and define the maximal reduced form of T as the smallest tree that can be obtained from T by pruning leaves and contracting edges while retaining the same information content. We show that any two MUL-trees with the same information content exhibit the same reduced form. This introduces an equivalence relation in MUL-trees with potential applications to comparing MUL-trees. We present an efficient algorithm to reduce a MUL-tree to its maximally reduced form and evaluate its performance on empirical datasets in terms of both quality of the reduced tree and the degree of data reduction achieved.

  17. Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures.

    Science.gov (United States)

    Humphreys, K; Demetriou, G; Gaizauskas, R

    2000-01-01

    Information extraction technology, as defined and developed through the U.S. DARPA Message Understanding Conferences (MUCs), has proved successful at extracting information primarily from newswire texts and primarily in domains concerned with human activity. In this paper we consider the application of this technology to the extraction of information from scientific journal papers in the area of molecular biology. In particular, we describe how an information extraction system designed to participate in the MUC exercises has been modified for two bioinformatics applications: EMPathIE, concerned with enzyme and metabolic pathways; and PASTA, concerned with protein structure. Progress to date provides convincing grounds for believing that IE techniques will deliver novel and effective ways for scientists to make use of the core literature which defines their disciplines.

  18. A review of literature on the use of machine learning methods for opinion mining

    Directory of Open Access Journals (Sweden)

    Aytuğ ONAN

    2016-05-01

    Full Text Available Opinion mining is an emerging field which uses methods of natural language processing, text mining and computational linguistics to extract subjective information of opinion holders. Opinion mining can be viewed as a classification problem. Hence, machine learning based methods are widely employed for sentiment classification. Machine learning based methods in opinion mining can be mainly classified as supervised, semi-supervised and unsupervised methods. In this study, main existing literature on the use of machine learning methods for opinion mining has been presented. Besides, the weak and strong characteristics of machine learning methods have been discussed.

  19. Discrimination and Privacy in the Information Society Data Mining and Profiling in Large Databases

    CERN Document Server

    Calders, Toon; Schermer, Bart; Zarsky, Tal

    2013-01-01

    Vast amounts of data are nowadays collected, stored and processed, in an effort to assist in  making a variety of administrative and governmental decisions. These innovative steps considerably improve the speed, effectiveness and quality of decisions. Analyses are increasingly performed by data mining and profiling technologies that statistically and automatically determine patterns and trends. However, when such practices lead to unwanted or unjustified selections, they may result in unacceptable forms of  discrimination. Processing vast amounts of data may lead to situations in which data controllers know many of the characteristics, behaviors and whereabouts of people. In some cases, analysts might know more about individuals than these individuals know about themselves. Judging people by their digital identities sheds a different light on our views of privacy and data protection. This book discusses discrimination and privacy issues related to data mining and profiling practices. It provides technologic...

  20. Process mining using convex polytopes

    OpenAIRE

    Alemany Puig, Lluís

    2017-01-01

    Process Mining is a relatively young field of study that highlights the difficulty to infer models of processes from which to extract enough information to make predictions about its behaviour, find bottlenecks and causality relationships so as to be able to answer as many questions as one can make about them. In this context, a process may be understood as any activity performed by humans or computers or the result between the interaction of the two. Research on this topic has...

  1. Mining and Energy Boom, Dutch Disease and Informality in Colombia: a DSGE Approach

    OpenAIRE

    2016-01-01

    The paper develops a Dynamic Stochastic General Equilibrium (DSGE) model, which assesses the macroeconomic and labor market effects derived from simulating a positive shock to the stochastic component of the mining-energy sector productivity. Calibrating the model for the Colombian economy, this shock generates a whole increase in formal wages and a raise in tax revenues, expanding total consumption of the household members. These facts increase non-tradable goods prices relative to tradable ...

  2. Financial Information Extraction Using Pre-defined and User-definable Templates in the LOLITA System

    OpenAIRE

    Costantino, Marco; Morgan, Richard G.; Collingham, Russell J.

    1996-01-01

    This paper addresses the issue of information extraction in the financial domain within the framework of a large Natural Language Processing system: LOLITA. The LOLITA system, Large-scale Object-based Linguistic Interactor Translator and Analyser, is a general purpose natural language processing system. Different kinds of applications have been built around the system's core. One of these is the financial information extraction application, which has been designed in close contact with expert...

  3. Extracting information masked by the chaotic signal of a time-delay system.

    Science.gov (United States)

    Ponomarenko, V I; Prokhorov, M D

    2002-08-01

    We further develop the method proposed by Bezruchko et al. [Phys. Rev. E 64, 056216 (2001)] for the estimation of the parameters of time-delay systems from time series. Using this method we demonstrate a possibility of message extraction for a communication system with nonlinear mixing of information signal and chaotic signal of the time-delay system. The message extraction procedure is illustrated using both numerical and experimental data and different kinds of information signals.

  4. Imaged document information location and extraction using an optical correlator

    Science.gov (United States)

    Stalcup, Bruce W.; Dennis, Phillip W.; Dydyk, Robert B.

    1999-12-01

    Today, the paper document is fast becoming a thing of the past. With the rapid development of fast, inexpensive computing and storage devices, many government and private organizations are archiving their documents in electronic form (e.g., personnel records, medical records, patents, etc.). Many of these organizations are converting their paper archives to electronic images, which are then stored in a computer database. Because of this, there is a need to efficiently organize this data into comprehensive and accessible information resources and provide for rapid access to the information contained within these imaged documents. To meet this need, Litton PRC and Litton Data Systems Division are developing a system, the Imaged Document Optical Correlation and Conversion System (IDOCCS), to provide a total solution to the problem of managing and retrieving textual and graphic information from imaged document archives. At the heart of IDOCCS, optical correlation technology provide a means for the search and retrieval of information from imaged documents. IDOCCS can be used to rapidly search for key words or phrases within the imaged document archives and has the potential to determine the types of languages contained within a document. In addition, IDOCCS can automatically compare an input document with the archived database to determine if it is a duplicate, thereby reducing the overall resources required to maintain and access the document database. Embedded graphics on imaged pages can also be exploited, e.g., imaged documents containing an agency's seal or logo can be singled out. In this paper, we present a description of IDOCCS as well as preliminary performance results and theoretical projections.

  5. Information Extraction of High Resolution Remote Sensing Images Based on the Calculation of Optimal Segmentation Parameters.

    Science.gov (United States)

    Zhu, Hongchun; Cai, Lijie; Liu, Haiying; Huang, Wei

    2016-01-01

    Multi-scale image segmentation and the selection of optimal segmentation parameters are the key processes in the object-oriented information extraction of high-resolution remote sensing images. The accuracy of remote sensing special subject information depends on this extraction. On the basis of WorldView-2 high-resolution data, the optimal segmentation parameters methodof object-oriented image segmentation and high-resolution image information extraction, the following processes were conducted in this study. Firstly, the best combination of the bands and weights was determined for the information extraction of high-resolution remote sensing image. An improved weighted mean-variance method was proposed andused to calculatethe optimal segmentation scale. Thereafter, the best shape factor parameter and compact factor parameters were computed with the use of the control variables and the combination of the heterogeneity and homogeneity indexes. Different types of image segmentation parameters were obtained according to the surface features. The high-resolution remote sensing images were multi-scale segmented with the optimal segmentation parameters. Ahierarchical network structure was established by setting the information extraction rules to achieve object-oriented information extraction. This study presents an effective and practical method that can explain expert input judgment by reproducible quantitative measurements. Furthermore the results of this procedure may be incorporated into a classification scheme.

  6. Information Extraction of High Resolution Remote Sensing Images Based on the Calculation of Optimal Segmentation Parameters.

    Directory of Open Access Journals (Sweden)

    Hongchun Zhu

    Full Text Available Multi-scale image segmentation and the selection of optimal segmentation parameters are the key processes in the object-oriented information extraction of high-resolution remote sensing images. The accuracy of remote sensing special subject information depends on this extraction. On the basis of WorldView-2 high-resolution data, the optimal segmentation parameters methodof object-oriented image segmentation and high-resolution image information extraction, the following processes were conducted in this study. Firstly, the best combination of the bands and weights was determined for the information extraction of high-resolution remote sensing image. An improved weighted mean-variance method was proposed andused to calculatethe optimal segmentation scale. Thereafter, the best shape factor parameter and compact factor parameters were computed with the use of the control variables and the combination of the heterogeneity and homogeneity indexes. Different types of image segmentation parameters were obtained according to the surface features. The high-resolution remote sensing images were multi-scale segmented with the optimal segmentation parameters. Ahierarchical network structure was established by setting the information extraction rules to achieve object-oriented information extraction. This study presents an effective and practical method that can explain expert input judgment by reproducible quantitative measurements. Furthermore the results of this procedure may be incorporated into a classification scheme.

  7. ADVANCED EXTRACTION OF SPATIAL INFORMATION FROM HIGH RESOLUTION SATELLITE DATA

    Directory of Open Access Journals (Sweden)

    T. Pour

    2016-06-01

    Full Text Available In this paper authors processed five satellite image of five different Middle-European cities taken by five different sensors. The aim of the paper was to find methods and approaches leading to evaluation and spatial data extraction from areas of interest. For this reason, data were firstly pre-processed using image fusion, mosaicking and segmentation processes. Results going into the next step were two polygon layers; first one representing single objects and the second one representing city blocks. In the second step, polygon layers were classified and exported into Esri shapefile format. Classification was partly hierarchical expert based and partly based on the tool SEaTH used for separability distinction and thresholding. Final results along with visual previews were attached to the original thesis. Results are evaluated visually and statistically in the last part of the paper. In the discussion author described difficulties of working with data of large size, taken by different sensors and different also thematically.

  8. Seeking science information online: Data mining Google to better understand the roles of the media and the education system.

    Science.gov (United States)

    Segev, Elad; Baram-Tsabari, Ayelet

    2012-10-01

    Which extrinsic cues motivate people to search for science-related information? For many science-related search queries, media attention and time during the academic year are highly correlated with changes in information seeking behavior (expressed by changes in the proportion of Google science-related searches). The data mining analysis presented here shows that changes in the volume of searches for general and well-established science terms are strongly linked to the education system. By contrast, ad-hoc events and current concerns were better aligned with media coverage. The interest and ability to independently seek science knowledge in response to current events or concerns is one of the fundamental goals of the science literacy movement. This method provides a mirror of extrapolated behavior and as such can assist researchers in assessing the role of the media in shaping science interests, and inform the ways in which lifelong interests in science are manifested in real world situations.

  9. Applications of state estimation in multi-sensor information fusion for the monitoring of open pit mine slope deformation

    Institute of Scientific and Technical Information of China (English)

    FU Hua; LIU Yin-ping; XIAO Jian

    2008-01-01

    The traditional open pit mine slope deformation monitoring system can not use the monitoring information coming from many monitoring points at the same time,can only using the monitoring data coming from a key monitoring point, and that is to say it can only handle one-dimensional time series. Given this shortage in the monitoring,the multi-sensor information fusion in the state estimation techniques would be introduced to the slope deformation monitoring system, and by the dynamic characteristics of deformation slope, the open pit slope would be regarded as a dynamic goal, the condition monitoring of which would be regarded as a dynamic target tracking. Distributed Information fusion technology with feedback was used to process the monitoring data and on this basis Klman filtering algorithms was introduced, and the simulation examples was used to prove its effectivenes.

  10. An Effective Approach to Biomedical Information Extraction with Limited Training Data

    CERN Document Server

    Jonnalagadda, Siddhartha

    2011-01-01

    Overall, the two main contributions of this work include the application of sentence simplification to association extraction as described above, and the use of distributional semantics for concept extraction. The proposed work on concept extraction amalgamates for the first time two diverse research areas -distributional semantics and information extraction. This approach renders all the advantages offered in other semi-supervised machine learning systems, and, unlike other proposed semi-supervised approaches, it can be used on top of different basic frameworks and algorithms. http://gradworks.umi.com/34/49/3449837.html

  11. 36 CFR 6.7 - Mining wastes.

    Science.gov (United States)

    2010-07-01

    ... DISPOSAL SITES IN UNITS OF THE NATIONAL PARK SYSTEM § 6.7 Mining wastes. (a) Solid waste from mining includes but is not limited to mining overburden, mining byproducts, solid waste from the extraction... 36 Parks, Forests, and Public Property 1 2010-07-01 2010-07-01 false Mining wastes. 6.7 Section 6...

  12. Omnidirectional vision systems calibration, feature extraction and 3D information

    CERN Document Server

    Puig, Luis

    2013-01-01

    This work focuses on central catadioptric systems, from the early step of calibration to high-level tasks such as 3D information retrieval. The book opens with a thorough introduction to the sphere camera model, along with an analysis of the relation between this model and actual central catadioptric systems. Then, a new approach to calibrate any single-viewpoint catadioptric camera is described.  This is followed by an analysis of existing methods for calibrating central omnivision systems, and a detailed examination of hybrid two-view relations that combine images acquired with uncalibrated

  13. Data-Driven Information Extraction from Chinese Electronic Medical Records.

    Directory of Open Access Journals (Sweden)

    Dong Xu

    Full Text Available This study aims to propose a data-driven framework that takes unstructured free text narratives in Chinese Electronic Medical Records (EMRs as input and converts them into structured time-event-description triples, where the description is either an elaboration or an outcome of the medical event.Our framework uses a hybrid approach. It consists of constructing cross-domain core medical lexica, an unsupervised, iterative algorithm to accrue more accurate terms into the lexica, rules to address Chinese writing conventions and temporal descriptors, and a Support Vector Machine (SVM algorithm that innovatively utilizes Normalized Google Distance (NGD to estimate the correlation between medical events and their descriptions.The effectiveness of the framework was demonstrated with a dataset of 24,817 de-identified Chinese EMRs. The cross-domain medical lexica were capable of recognizing terms with an F1-score of 0.896. 98.5% of recorded medical events were linked to temporal descriptors. The NGD SVM description-event matching achieved an F1-score of 0.874. The end-to-end time-event-description extraction of our framework achieved an F1-score of 0.846.In terms of named entity recognition, the proposed framework outperforms state-of-the-art supervised learning algorithms (F1-score: 0.896 vs. 0.886. In event-description association, the NGD SVM is superior to SVM using only local context and semantic features (F1-score: 0.874 vs. 0.838.The framework is data-driven, weakly supervised, and robust against the variations and noises that tend to occur in a large corpus. It addresses Chinese medical writing conventions and variations in writing styles through patterns used for discovering new terms and rules for updating the lexica.

  14. Research of Safety Information Management Model of Coal Mine Based on Data Mining%基于数据挖掘的煤矿安全信息管理模型的研究

    Institute of Scientific and Technical Information of China (English)

    赵文涛; 杨静

    2009-01-01

    Based on analysis of data mining technology, rough set theory, fuzzy logic and neural networks algorithm, the paper proposed a design scheme of safety information management model of coal mine based on data mining. The model uses Web server registration technology and XML data composition technology to form terminal database, adopts rough set theory and fuzzy logic and neural networks algorithm of data mining technology to form terminal data warehouse, and analyzes, manages and maintains terminal data warehouse in an unified may. The model improves efficiency of safety information management of coal mine effectively.%基于对数据挖掘技术、粗糙集理论、模糊逻辑与神经网络算法的分析,文章提出了一种基于数据挖掘的煤矿安全信息管理模型的设计方案.该模型应用Web服务器注册技术和XML数据合成技术形成终端数据库,采用数据挖掘技术中的粗糙集理论和模糊逻辑与神经网络算法形成终端数据仓库,并对终端数据仓库进行统一分析、管理和维护,有效地提高了煤矿安全信息管理的效率.

  15. Extraction of Left Ventricular Ejection Fraction Information from Various Types of Clinical Reports.

    Science.gov (United States)

    Kim, Youngjun; Garvin, Jennifer H; Goldstein, Mary K; Hwang, Tammy S; Redd, Andrew; Bolton, Dan; Heidenreich, Paul A; Meystre, Stéphane M

    2017-02-02

    Efforts to improve the treatment of congestive heart failure, a common and serious medical condition, include the use of quality measures to assess guideline-concordant care. The goal of this study is to identify left ventricular ejection fraction (LVEF) information from various types of clinical notes, and to then use this information for heart failure quality measurement. We analyzed the annotation differences between a new corpus of clinical notes from the Echocardiography, Radiology, and Text Integrated Utility package and other corpora annotated for natural language processing (NLP) research in the Department of Veterans Affairs. These reports contain varying degrees of structure. To examine whether existing LVEF extraction modules we developed in prior research improve the accuracy of LVEF information extraction from the new corpus, we created two sequence-tagging NLP modules trained with a new data set, with or without predictions from the existing LVEF extraction modules. We also conducted a set of experiments to examine the impact of training data size on information extraction accuracy. We found that less training data is needed when reports are highly structured, and that combining predictions from existing LVEF extraction modules improves information extraction when reports have less structured formats and a rich set of vocabulary.

  16. Identification of Mine-Shaped Objects based on an Efficient Phase Stepped-Frequency Radar Approach

    DEFF Research Database (Denmark)

    Sørensen, Helge Bjarup Dissing; Jakobsen, Kaj Bjarne; Nymann, Ole

    1997-01-01

    A computational efficient approach to identify very small mine-shaped plastic objects, e.g. M56 Anti-Personnel (AP) mines buried in the ground, is presented. The size of the objects equals the smallest AP-mines in use today, i.e., the most difficult mines to detect with respect to humanitarian mine...... a radar probe is moved automatically to measure in each grid point a set of reflection coefficients from which phase and amplitude information are extracted. Based on a simple processing of the phase information, quarternary image and template cross-correlation a successful detection of metal- and non......-metal mine-shaped objects is possible. Measurements have been performed on loamy soil containing different mine-shaped objects...

  17. DrugQuest - a text mining workflow for drug association discovery

    OpenAIRE

    Papanikolaou, Nikolas; Pavlopoulos, Georgios A.; Theodosiou ,Theodosios; Vizirianakis, Ioannis S.; Iliopoulos, Ioannis

    2016-01-01

    Background Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Results Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based...

  18. Image Mining Using Texture and Shape Feature

    Directory of Open Access Journals (Sweden)

    Prof.Rupali Sawant

    2010-12-01

    Full Text Available Discovering knowledge from data stored in typical alphanumeric databases, such as relational databases, has been the focal point of most of the work in database mining. However, with advances in secondary and tertiary storage capacity, coupled with a relatively low storage cost, more and more non standard data (in the form of images is being accumulated. This vast collection of image data can also be mined to discover new and valuable knowledge. During theprocess of image mining, the concepts in different hierarchiesand their relationships are extracted from different hierarchies and granularities, and association rule mining and concept clustering are consequently implemented. The generalization and specialization of concepts are realized in different hierarchies, lower layer concepts can be upgraded to upper layer concepts, and upper layer concepts guide the extraction of lower layer concepts. It is a process from image data to image information, from image information to imageknowledge, from lower layer concepts to upper layer concept lattice and cloud model theory is proposed. The methods of image mining from image texture and shape features are introduced here, which include the following basic steps: firstly pre-process images secondly use cloud model to extract concepts, lastly use concept lattice to extracta series of image knowledge.

  19. Post-processing of Deep Web Information Extraction Based on Domain Ontology

    Directory of Open Access Journals (Sweden)

    PENG, T.

    2013-11-01

    Full Text Available Many methods are utilized to extract and process query results in deep Web, which rely on the different structures of Web pages and various designing modes of databases. However, some semantic meanings and relations are ignored. So, in this paper, we present an approach for post-processing deep Web query results based on domain ontology which can utilize the semantic meanings and relations. A block identification model (BIM based on node similarity is defined to extract data blocks that are relevant to specific domain after reducing noisy nodes. Feature vector of domain books is obtained by result set extraction model (RSEM based on vector space model (VSM. RSEM, in combination with BIM, builds the domain ontology on books which can not only remove the limit of Web page structures when extracting data information, but also make use of semantic meanings of domain ontology. After extracting basic information of Web pages, a ranking algorithm is adopted to offer an ordered list of data records to users. Experimental results show that BIM and RSEM extract data blocks and build domain ontology accurately. In addition, relevant data records and basic information are extracted and ranked. The performances precision and recall show that our proposed method is feasible and efficient.

  20. STUDY ON EXTRACTING METHODS OF BURIED GEOLOGICAL INFORMATION IN HUAIBEI COAL FIELD

    Institute of Scientific and Technical Information of China (English)

    王四龙; 赵学军; 凌贻棕; 刘玉荣; 宁书年; 侯德文

    1999-01-01

    It is discussed features and the producing mechanism of buried geological information in geological, geophysical and remote sensing data in Huaibei coal field, and studied the methods extracting buried tectonic and igneous rock information from various geological data using digital image processing techniques.

  1. Data Mining of NS-2 Trace File

    Directory of Open Access Journals (Sweden)

    Ahmed Jawad Kadhim

    2014-11-01

    Full Text Available Data mining is important process to extract the use ful information and pattern from huge amount of dat a. NS-2 is an efficient tool to build the environment of network. The results from simulate these environ ment in NS-2 is trace file that contains several columns and lines represent the network events. This trace file can be used to analyse the network according to per formance metrics but it has redundant columns and rows. So, this paper is to perform the data mining in order to find only the necessary information in analysis operation to reduce the execution time and the storage size of the trace file.

  2. Analysis of Automated Modern Web Crawling and Testing Tools and Their Possible Employment for Information Extraction

    Directory of Open Access Journals (Sweden)

    Tomas Grigalis

    2012-04-01

    Full Text Available World Wide Web has become an enormously big repository of data. Extracting, integrating and reusing this kind of data has a wide range of applications, including meta-searching, comparison shopping, business intelligence tools and security analysis of information in websites. However, reaching information in modern WEB 2.0 web pages, where HTML tree is often dynamically modified by various JavaScript codes, new data are added by asynchronous requests to the web server and elements are positioned with the help of cascading style sheets, is a difficult task. The article reviews automated web testing tools for information extraction tasks.Article in Lithuanian

  3. Weak characteristic information extraction from early fault of wind turbine generator gearbox

    Science.gov (United States)

    Xu, Xiaoli; Liu, Xiuli

    2017-04-01

    Given the weak early degradation characteristic information during early fault evolution in gearbox of wind turbine generator, traditional singular value decomposition (SVD)-based denoising may result in loss of useful information. A weak characteristic information extraction based on μ-SVD and local mean decomposition (LMD) is developed to address this problem. The basic principle of the method is as follows: Determine the denoising order based on cumulative contribution rate, perform signal reconstruction, extract and subject the noisy part of signal to LMD and μ-SVD denoising, and obtain denoised signal through superposition. Experimental results show that this method can significantly weaken signal noise, effectively extract the weak characteristic information of early fault, and facilitate the early fault warning and dynamic predictive maintenance.

  4. Weak characteristic information extraction from early fault of wind turbine generator gearbox

    Science.gov (United States)

    Xu, Xiaoli; Liu, Xiuli

    2017-09-01

    Given the weak early degradation characteristic information during early fault evolution in gearbox of wind turbine generator, traditional singular value decomposition (SVD)-based denoising may result in loss of useful information. A weak characteristic information extraction based on μ-SVD and local mean decomposition (LMD) is developed to address this problem. The basic principle of the method is as follows: Determine the denoising order based on cumulative contribution rate, perform signal reconstruction, extract and subject the noisy part of signal to LMD and μ-SVD denoising, and obtain denoised signal through superposition. Experimental results show that this method can significantly weaken signal noise, effectively extract the weak characteristic information of early fault, and facilitate the early fault warning and dynamic predictive maintenance.

  5. Developing a Geological Management Information System: National Important Mining Zone Database

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Geo-data is a foundation for the prediction and assessment of ore resources, so managing and making full use of those data, including geography database, geology database, mineral deposits database, aeromagnetics database, gravity database, geochemistry database and remote sensing database, is very significant. We developed national important mining zone database (NIMZDB) to manage 14 national important mining zone databases to support a new round prediction of ore deposit. We found that attention should be paid to the following issues: ① data accuracy: integrity, logic consistency, attribute,spatial and time accuracy; ② management of both attribute and spatial data in the same system; ③transforming data between MapGIS and ArcGIS; ④ data sharing and security; ⑤ data searches that can query both attribute and spatial data. Accuracy of input data is guaranteed and the search, analysis and translation of data between MapGIS and ArcGIS has been made convenient via the development of a checking data module and a managing data module based on MapGIS and ArcGIS. Using ArcSDE, we based data sharing on a client/server system, and attribute and spatial data are also managed in the same system.

  6. The research of road and vehicle information extraction algorithm based on high resolution remote sensing image

    Science.gov (United States)

    Zhou, Tingting; Gu, Lingjia; Ren, Ruizhi; Cao, Qiong

    2016-09-01

    With the rapid development of remote sensing technology, the spatial resolution and temporal resolution of satellite imagery also have a huge increase. Meanwhile, High-spatial-resolution images are becoming increasingly popular for commercial applications. The remote sensing image technology has broad application prospects in intelligent traffic. Compared with traditional traffic information collection methods, vehicle information extraction using high-resolution remote sensing image has the advantages of high resolution and wide coverage. This has great guiding significance to urban planning, transportation management, travel route choice and so on. Firstly, this paper preprocessed the acquired high-resolution multi-spectral and panchromatic remote sensing images. After that, on the one hand, in order to get the optimal thresholding for image segmentation, histogram equalization and linear enhancement technologies were applied into the preprocessing results. On the other hand, considering distribution characteristics of road, the normalized difference vegetation index (NDVI) and normalized difference water index (NDWI) were used to suppress water and vegetation information of preprocessing results. Then, the above two processing result were combined. Finally, the geometric characteristics were used to completed road information extraction. The road vector extracted was used to limit the target vehicle area. Target vehicle extraction was divided into bright vehicles extraction and dark vehicles extraction. Eventually, the extraction results of the two kinds of vehicles were combined to get the final results. The experiment results demonstrated that the proposed algorithm has a high precision for the vehicle information extraction for different high resolution remote sensing images. Among these results, the average fault detection rate was about 5.36%, the average residual rate was about 13.60% and the average accuracy was approximately 91.26%.

  7. Information Mining Application Based on SOA%基于SOA的信息挖掘应用研究

    Institute of Scientific and Technical Information of China (English)

    江兆银; 刘瑶; 李斌; 朱迎华

    2011-01-01

    传统的数据挖掘一般都是从特定的、静态的数据源进行挖掘,可是现实世界中信息的来源是复杂多样的,特别是网络信息,利用传统的挖掘算法无法准确地从大量的无结构动态数据源中挖掘出有用的信息,本文提出了基于面向服务架构的网络数据整合方法,使不同单位、不同平台、不同数据结构的网络数据能够有机地整合并实现信息的共享和交互.通过对研究生调剂相关数据的实例应用研究,可以看到采用基于SOA的WebService技术来整合实时的研究生调剂数据的方法取得了很好的效果.%Traditional data mining usually dig from the specific, static data source, but in the real world sources of information are complex, especially the web information, using traditional mining algorithm can't dig the useful information from a large number of unstructured dynamic data accurately. Integration methods of Internet data based on service-oriented architecture can achieve integration the different units, different platforms and different data structures and network data,sharing information and interaction. Through the example of post-graduate applied research, we can see that good results have been achieved in integrating data of real-time postgraduate regulator base on SOA WebService technology.

  8. [Hyperspectral extraction of soil available nitrogen in Nan Mountain coal waste scenic spot of Jinhuagong Mine based on enter-PLSR].

    Science.gov (United States)

    Lin, Li-xin; Wang, Yun-jia; Xiong, Ji-bing

    2014-06-01

    Soil available nitrogen content is an important index reflecting soil fertility. It provides dynamic information for land reclamation and ecological restoration if soil available nitrogen can be monitored and evaluated using hyperspectral technology. Facing the study blank of soil available nitrogen in National Mine Park and the deficiency of poor computational efficiency of partial least squares regression (PLSR) method, the present paper presents the relationship between soil spectrum and soil available nitrogen based on spectrum curves (ranging from 350 to 2 500 nm) of 30 salinized chestnut soil samples, which were collected from southern mountain coal waste scenic spot, located in Jinhuagong mine in Datong city, Shanxi Province, China (one part of Jinhuagong national mine park). Soil reflection spectrum was mathematically manipulated into first derivative and inverse-log spectral curves, then a corresponding estimation model was built and examined by PLSR and Enter-partial least squares regression (Enter-PLSR) based on characteristic absorption. The result indicated that Enter-PLSR corresponding estimation model greatly increased the computation efficiency by reducing the number of independent variables to 12 from 122 in case of a close accuracy of PLS corresponding estimation model. By using hyperspectral technology and Enter-PLSR method, the study blank of soil available nitrogen in National Mine Park was filled. At the same time, the computation efficiency problem of PLSR was resolved.

  9. The Technology of Extracting Content Information from Web Page Based on DOM Tree

    Science.gov (United States)

    Yuan, Dingrong; Mo, Zhuoying; Xie, Bing; Xie, Yangcai

    There are huge amounts of information on Web pages, which includes content information and other useless information, such as navigation, advertisement and flash of animation etc. Reducing the toils of Web users, we estabished a thechnique to extract the content information from web page. Fristly, we analyzed the semantic of web documents by V8 engine of Google and parsed the web document into DOM tree. And then, traversed the DOM tree, pruned the DOM tree in the light of the characteristic of Web page's edit language. Finally, we extracted the content information from Web page. Theoretics and experiments showed that the technique could simplify the web page, present the content information to web users and supply clean data for applicable area, such as retrieval, KDD and DM from web.

  10. The Application of the Web Text Mining in the Druggist Interest Extraction%Web文本挖掘在药商兴趣提取中的应用

    Institute of Scientific and Technical Information of China (English)

    孙士新

    2014-01-01

    The information attainment has become the important component of the druggist's business operation and the market judgment basis. The appearance of the largely unstructured and semi-structured network has provided the technology space and the demonstration basis for the druggist's individual service. Through the critical technology of the text mining in individual service,the paper,combining the Traditional Chinese Medicinal Materials information website,has actually applied the text mining process, and applies the text mining technology to the example of the user's interest attainment about the Traditional Chinese Medicinal Materials information website.%信息获取已成为药商经营活动的重要组成部分和市场判断依据,网络大量非结构化、半结构化信息的出现为药商个性化服务提供了技术空间和实证依据。文章通过对个性化服务中文本挖掘的关键技术进行设计,并应用了中药材信息网站文本挖掘流程,把文本挖掘技术应用于中药材信息网站的用户兴趣获取实例中,实现用户兴趣的自动获取功能。

  11. Data Mining Using Neural–Genetic Approach: A Review

    Directory of Open Access Journals (Sweden)

    Parvez Rahi

    2014-04-01

    Full Text Available In the advance age of technology, there is an increasing availability of digital documents in various languages in various fields. Data mining is gaining popularity in field of knowledge discovery. Data mining is the knowledge discovery process by which we can analyze the large amounts of data from various data repositories and summarizing it into information useful to us. Due to its importance of extracting information/ knowledge from the large data repositories, data mining has become an essential part of human life in various fields. Data mining has a very wide area of applications, and these applications have enriched the human life in various fields including scientific, medical, business, education etc. Here in this paper we will discuss the emphasis of Neural Network and Genetic Algorithm in the field of data mining.

  12. ONTOLOGY BASED DATA MINING METHODOLOGY FOR DISCRIMINATION PREVENTION

    Directory of Open Access Journals (Sweden)

    Nandana Nagabhushana

    2014-09-01

    Full Text Available Data Mining is being increasingly used in the field of automation of decision making processes, which involve extraction and discovery of information hidden in large volumes of collected data. Nonetheless, there are negative perceptions like privacy invasion and potential discrimination which contribute as hindrances to the use of data mining methodologies in software systems employing automated decision making. Loan granting, Employment, Insurance Premium calculation, Admissions in Educational Institutions etc., can make use of data mining to effectively prevent human biases pertaining to certain attributes like gender, nationality, race etc. in critical decision making. The proposed methodology prevents discriminatory rules ensuing due to the presence of certain information regarding sensitive discriminatory attributes in the data itself. Two aspects of novelty in the proposal are, first, the rule mining technique based on ontologies and the second, concerning generalization and transformation of the mined rules that are quantized as discriminatory, into non-discriminatory ones.

  13. Rise of Data Mining: Current and Future Application Areas

    Directory of Open Access Journals (Sweden)

    Dharminder Kumar

    2011-09-01

    Full Text Available Knowledge has played a significant role on human activities since his development. Data mining is the process of knowledge discovery where knowledge is gained by analyzing the data store in very large repositories, which are analyzed from various perspectives and the result is summarized it into useful information. Due to the importance of extracting knowledge/information from the large data repositories, data mining has become a very important and guaranteed branch of engineering affecting human life in various spheres directly or indirectly. Advancements in Statistics, Machine Learning, Artificial Intelligence, Pattern recognition and Computation capabilities have given present days data mining functionality a new height. Data mining have various applications and these applications have enriched the various fields of human life including business, education, medical, scientific etc. Objective of this paper is to discuss various improvements and breakthroughs in the field of data mining from past to the present and also to explores the future trends.

  14. Preprocessing Techniques for Image Mining on Biopsy Images

    Directory of Open Access Journals (Sweden)

    Ms. Nikita Ramrakhiani

    2015-08-01

    Full Text Available Biomedical imaging has been undergoing rapid technological advancements over the last several decades and has seen the development of many new applications. A single Image can give all the details about an organ from the cellular level to the whole-organ level. Biomedical imaging is becoming increasingly important as an approach to synthesize, extract and translate useful information from large multidimensional databases accumulated in research frontiers such as functional genomics, proteomics, and functional imaging. To fulfill this approach Image Mining can be used. Image Mining will bridge this gap to extract and translate semantically meaningful information from biomedical images and apply it for testing and detecting any anomaly in the target organ. The essential component in image mining is identifying similar objects in different images and finding correlations in them. Integration of Image Mining and Biomedical field can result in many real world applications

  15. What do professional forecasters' stock market expectations tell us about herding, information extraction and beauty contests?

    DEFF Research Database (Denmark)

    Rangvid, Jesper; Schmeling, M.; Schrimpf, A.

    2013-01-01

    We study how professional forecasters form equity market expectations based on a new micro-level dataset which includes rich cross-sectional information about individual characteristics. We focus on testing whether agents rely on the beliefs of others, i.e., consensus expectations, when forming t...... that neither information extraction to incorporate dispersed private information, nor herding for reputational reasons can fully explain these results, leaving Keynes' beauty contest argument as a potential candidate for explaining forecaster behavior....

  16. Extraction of Hidden Social Networks from Wiki-Environment Involved in Information Conflict

    OpenAIRE

    Alguliyev, Rasim M.; Ramiz M. Aliguliyev; Irada Y. Alakbarova

    2016-01-01

    Social network analysis is a widely used technique to analyze relationships among wiki-users in Wikipedia. In this paper the method to identify hidden social networks participating in information conflicts in wiki-environment is proposed. In particular, we describe how text clustering techniques can be used for extraction of hidden social networks of wiki-users caused information conflict. By clustering unstructured text articles caused information conflict we ...

  17. Mining free-text medical records.

    Science.gov (United States)

    Heinze, D T; Morsch, M L; Holbrook, J

    2001-01-01

    Text mining projects can be characterized along four parameters: 1) the demands of the market in terms of target domain and specificity and depth of queries; 2) the volume and quality of text in the target domain; 3) the text mining process requirements; and 4) the quality assurance process that validates the extracted data. In this paper, we provide lessons learned and results from a large-scale commercial project using Natural Language Processing (NLP) for mining the transcriptions of dictated clinical records in a variety of medical specialties. We conclude that the current state-of-the-art in NLP is suitable for mining information of moderate content depth across a diverse collection of medical settings and specialties.

  18. Extracting information from the data flood of new solar telescopes. Brainstorming

    CERN Document Server

    Ramos, A Asensio

    2012-01-01

    Extracting magnetic and thermodynamic information from spectropolarimetric observations is a difficult and time consuming task. The amount of science-ready data that will be generated by the new family of large solar telescopes is so large that we will be forced to modify the present approach to inference. In this contribution, I propose several possible ways that might be useful for extracting the thermodynamic and magnetic properties of solar plasmas from such observations quickly.

  19. Extracting Information from the Data Flood of New Solar Telescopes: Brainstorming

    Science.gov (United States)

    Asensio Ramos, A.

    2012-12-01

    Extracting magnetic and thermodynamic information from spectropolarimetric observations is a difficult and time consuming task. The amount of science-ready data that will be generated by the new family of large solar telescopes is so large that we will be forced to modify the present approach to inference. In this contribution, I propose several possible ways that might be useful for extracting the thermodynamic and magnetic properties of solar plasmas from such observations quickly.

  20. Seismic surveying for coal mine planning

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, B. [CMTE/CSIRO Exploration and Mining, Kenmore, Qld. (Australia)

    2002-07-01

    More and more coal in Australia is extracted by underground mining methods especially by longwall mining. These methods can be particularly sensitive to relatively small-scale structural discontinuities and variations in roof and floor rock character. Traditionally, information on these features has been obtained through drilling. However, this is an expensive process and its relevance is limited to the immediate neighbourhood of the boreholes. Seismic surveying, especially by 3D seismic, is an alternative tool for geological structure delineation. It is one of the most effective geophysical methods available for identification of geological structures such as faults, folds, washouts, seam splits and thickness changes which are normally associated with potential mining hazards. Seismic data even can be used for stratigraphic identification. The information extracted from seismic data can be integrated into mine planning and design. In this paper, computer aided interpretation techniques for maximising the information from seismic data are demonstrated and the ability of seismic reflection methods to resolve localised geological features illustrated. Both synthetic and real seismic data obtained in recent 2D and 3D seismic surveys from Australian coal mines are used. 7 refs., 9 figs.

  1. Applying Data Mining Techniques to Improve Information Security in the Cloud: A Single Cache System Approach

    Directory of Open Access Journals (Sweden)

    Amany AlShawi

    2016-01-01

    Full Text Available Presently, the popularity of cloud computing is gradually increasing day by day. The purpose of this research was to enhance the security of the cloud using techniques such as data mining with specific reference to the single cache system. From the findings of the research, it was observed that the security in the cloud could be enhanced with the single cache system. For future purposes, an Apriori algorithm can be applied to the single cache system. This can be applied by all cloud providers, vendors, data distributors, and others. Further, data objects entered into the single cache system can be extended into 12 components. Database and SPSS modelers can be used to implement the same.

  2. Research of building information extraction and evaluation based on high-resolution remote-sensing imagery

    Science.gov (United States)

    Cao, Qiong; Gu, Lingjia; Ren, Ruizhi; Wang, Lang

    2016-09-01

    Building extraction currently is important in the application of high-resolution remote sensing imagery. At present, quite a few algorithms are available for detecting building information, however, most of them still have some obvious disadvantages, such as the ignorance of spectral information, the contradiction between extraction rate and extraction accuracy. The purpose of this research is to develop an effective method to detect building information for Chinese GF-1 data. Firstly, the image preprocessing technique is used to normalize the image and image enhancement is used to highlight the useful information in the image. Secondly, multi-spectral information is analyzed. Subsequently, an improved morphological building index (IMBI) based on remote sensing imagery is proposed to get the candidate building objects. Furthermore, in order to refine building objects and further remove false objects, the post-processing (e.g., the shape features, the vegetation index and the water index) is employed. To validate the effectiveness of the proposed algorithm, the omission errors (OE), commission errors (CE), the overall accuracy (OA) and Kappa are used at final. The proposed method can not only effectively use spectral information and other basic features, but also avoid extracting excessive interference details from high-resolution remote sensing images. Compared to the original MBI algorithm, the proposed method reduces the OE by 33.14% .At the same time, the Kappa increase by 16.09%. In experiments, IMBI achieved satisfactory results and outperformed other algorithms in terms of both accuracies and visual inspection

  3. Extracting important information from Chinese Operation Notes with natural language processing methods.

    Science.gov (United States)

    Wang, Hui; Zhang, Weide; Zeng, Qiang; Li, Zuofeng; Feng, Kaiyan; Liu, Lei

    2014-04-01

    Extracting information from unstructured clinical narratives is valuable for many clinical applications. Although natural Language Processing (NLP) methods have been profoundly studied in electronic medical records (EMR), few studies have explored NLP in extracting information from Chinese clinical narratives. In this study, we report the development and evaluation of extracting tumor-related information from operation notes of hepatic carcinomas which were written in Chinese. Using 86 operation notes manually annotated by physicians as the training set, we explored both rule-based and supervised machine-learning approaches. Evaluating on unseen 29 operation notes, our best approach yielded 69.6% in precision, 58.3% in recall and 63.5% F-score. Copyright © 2014 Elsevier Inc. All rights reserved.

  4. Automatically extracting clinically useful sentences from UpToDate to support clinicians' information needs.

    Science.gov (United States)

    Mishra, Rashmi; Del Fiol, Guilherme; Kilicoglu, Halil; Jonnalagadda, Siddhartha; Fiszman, Marcelo

    2013-01-01

    Clinicians raise several information needs in the course of care. Most of these needs can be met by online health knowledge resources such as UpToDate. However, finding relevant information in these resources often requires significant time and cognitive effort. To design and assess algorithms for extracting from UpToDate the sentences that represent the most clinically useful information for patient care decision making. We developed algorithms based on semantic predications extracted with SemRep, a semantic natural language processing parser. Two algorithms were compared against a gold standard composed of UpToDate sentences rated in terms of clinical usefulness. Clinically useful sentences were strongly correlated with predication frequency (correlation= 0.95). The two algorithms did not differ in terms of top ten precision (53% vs. 49%; p=0.06). Semantic predications may serve as the basis for extracting clinically useful sentences. Future research is needed to improve the algorithms.

  5. OPINION MINING AND SENTIMENT ANALYSIS TECHNIQUES: A RECENT SURVEY

    OpenAIRE

    Ms. Kalyani D. Gaikwad*, Prof. Sonawane V.R

    2016-01-01

    Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials. Sentiment analysis is widely applied to reviews and social media for a variety of applications, ranging from marketing to customer service. The difficulties of performing sentiment analysis in this domain can be overcome by leveraging on common-sense knowledge bases. Opinion Mining is...

  6. Mining Mid-level Features for Image Classification

    OpenAIRE

    Fernando, Basura; Fromont, Elisa; Tuytelaars, Tinne

    2014-01-01

    International audience; Mid-level or semi-local features learnt using class-level information are potentially more distinctive than the traditional low-level local features constructed in a purely bottom-up fashion. At the same time they preserve some of the robustness properties with respect to occlusions and image clutter. In this paper we propose a new and effective scheme for extracting mid-level features for image classification, based on relevant pattern mining. In par- ticular, we mine...

  7. Application of the spatial data mining module in analysis of mining ground deformation factors

    Directory of Open Access Journals (Sweden)

    Jan Blachowski

    2013-09-01

    Full Text Available Spatial data mining methods for example those based on artificial neural networks (ANN allow extraction of information from databases and detection of otherwise hidden patterns occurring in these data and in consequence acquiring new knowledge on the analysed phenomena or processes. One of these techniques is the multivariate statistical analysis, which facilitates identification of patterns otherwise difficult to observe. In the paper an attempt of applying self-organising maps (SOM to explore and analyse spatial data related to studies of ground subsidence associated with underground mining has been described. The study has been carried out on a selected part of a former underground coal mining area in SW Poland with the aim to analyse the influence of particular ground deformation factors on the observed subsidence and the relationships between these factors. The research concerned the uppermost coal panels and the following factors: mining system, time of mining activity and inclination, thickness and depth below the ground of the exploited coal panels. It has been found that the exploratory spatial data analysis can be used to identify relationships in multidimensional data related to mining induced ground subsidence. The proposed approach may be found useful in identification of areas threatened by mining related subsidence and in creating scenarios of developing deformation zones and therefore aid spatial development of mining grounds.

  8. Sequential extractions on mine tailings samples after and before bioassays: implications on the speciation of metals during microbial re-colonization

    Science.gov (United States)

    García-Meza, J. V.; Carrillo-Chávez, A.; Morton-Bermea, O.

    2006-01-01

    Mine tailings may be remediated using metal tolerant microorganisms as they may solve the limiting conditions for healthy development of plants (i.e., low organic mater content and poor physical conditions). The aim of this study was to investigate the consequences of microbial colonization on the chemical speciation of trace metals. Surface samples from the Valenciana mine tailings (Guanajuato, Mexico) were used for long-term bioassays (BA), which consisted in the promotion of microorganisms, development on tailings material under stable laboratory conditions (humidity, temperature, and light exposure). A five-step sequential extraction method (exchangeable, carbonate/specifically adsorbed, Fe-Mn oxides, organic matter (OM)/sulfide, and residual fractions) was performed before and after BA. Extraction solutions and leachates were analyzed by inductively coupled plasma-mass spectrometry. OM content, cationic exchange capacity, and pH values were also assessed before and after BA. The results indicate that trace elements are generally present in nonresidual fractions, mainly in the Fe-Mn oxides fraction. The concentration of total Zn, As, Se, Pb, and exchangeable Cu and Pb is above the recommendable limits for soils. Despite the high bioavailability of the former elements, biofilms successfully colonized the tailing samples during the BA. Cyanobacteria and green algae, heterotrophic fungi, aerobic bacteria, and anaerobic bacteria composed the developed biofilms. Chemical controls of trace elements could be attributed to absorption onto inorganic complexes (carbonates, metal oxides), while biofilm occurrence seems to enhance complexation and immobilization of Cr, Ni, Cu, Zn, As, and Pb. The biofilm developed does not increase the bioavailable forms and the leaching of the trace elements, but significantly improves the OM contents (natural fertilization). The results suggest that biofilms are useful during the first steps of the mine tailings remediation.

  9. Extraction of Informative Blocks from Deep Web Page Using Similar Layout Feature

    OpenAIRE

    Zeng,Jun; Flanagan, Brendan; Hirokawa, Sachio

    2013-01-01

    Due to the explosive growth and popularity of the deep web, information extraction from deep web page has gained more and more attention. However, the HTML structure of web page has become more complicated, making it difficult to recognize target content by only analyzing the HTML source code. In this paper, we propose a method to extract the informative blocks from a deep web using the layout feature. We consider the visual rectangular region of an HTML element as a visual block in web page....

  10. Information extraction for legal knowledge representation – a review of approaches and trends

    Directory of Open Access Journals (Sweden)

    Denis Andrei de Araujo

    2014-11-01

    Full Text Available This work presents an introduction to Information Extraction systems and a survey of the known approaches of Information Extraction in the legal area. This work analyzes with particular attention the techniques that rely on the representation of legal knowledge as a means to achieve better performance, with emphasis on those techniques including ontologies and linguistic support. Some details of the systems implementations are presented, followed by an analysis of the positive and negative points of each approach, aiming to bring the reader a critical position regarding the solutions studied.

  11. Extracting information from two-dimensional electrophoresis gels by partial least squares regression

    DEFF Research Database (Denmark)

    Jessen, Flemming; Lametsch, R.; Bendixen, E.;

    2002-01-01

    of all proteins/spots in the gels. In the present study it is demonstrated how information can be extracted by multivariate data analysis. The strategy is based on partial least squares regression followed by variable selection to find proteins that individually or in combination with other proteins vary......Two-dimensional gel electrophoresis (2-DE) produces large amounts of data and extraction of relevant information from these data demands a cautious and time consuming process of spot pattern matching between gels. The classical approach of data analysis is to detect protein markers that appear...

  12. Ultrasonic Signal Processing Algorithm for Crack Information Extraction on the Keyway of Turbine Rotor Disk

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Hong Kyu; Seo, Won Chan; Park, Chan [Pukyong National University, Busan (Korea, Republic of); Lee, Jong O; Son, Young Ho [KIMM, Daejeon (Korea, Republic of)

    2009-10-15

    An ultrasonic signal processing algorithm was developed for extracting the information of cracks generated around the keyway of a turbine rotor disk. B-scan images were obtained by using keyway specimens and an ultrasonic scan system with x-y position controller. The B-scan images were used as input images for 2-Dimensional signal processing, and the algorithm was constructed with four processing stages of pre-processing, crack candidate region detection, crack region classification and crack information extraction. It is confirmed by experiments that the developed algorithm is effective for the quantitative evaluation of cracks generated around the keyway of turbine rotor disk

  13. Information Extraction of High-Resolution Remotely Sensed Image Based on Multiresolution Segmentation

    Directory of Open Access Journals (Sweden)

    Peng Shao

    2014-08-01

    Full Text Available The principle of multiresolution segmentation was represented in detail in this study, and the canny algorithm was applied for edge-detection of a remotely sensed image based on this principle. The target image was divided into regions based on object-oriented multiresolution segmentation and edge-detection. Furthermore, object hierarchy was created, and a series of features (water bodies, vegetation, roads, residential areas, bare land and other information were extracted by the spectral and geometrical features. The results indicate that the edge-detection has a positive effect on multiresolution segmentation, and overall accuracy of information extraction reaches to 94.6% by the confusion matrix.

  14. A geographical information system-based analysis of cancer mortality and population exposure to coal mining activities in West Virginia, United States of America.

    Science.gov (United States)

    Hendryx, Michael; Fedorko, Evan; Anesetti-Rothermel, Andrew

    2010-05-01

    Cancer incidence and mortality rates are high in West Virginia compared to the rest of the United States of America. Previous research has suggested that exposure to activities of the coal mining industry may contribute to elevated cancer mortality, although exposure measures have been limited. This study tests alternative specifications of exposure to mining activity to determine whether a measure based on location of mines, processing plants, coal slurry impoundments and underground slurry injection sites relative to population levels is superior to a previously-reported measure of exposure based on tons mined at the county level, in the prediction of age-adjusted cancer mortality rates. To this end, we utilize two geographical information system (GIS) techniques--exploratory spatial data analysis and inverse distance mapping--to construct new statistical analyses. Total, respiratory and "other" age-adjusted cancer mortality rates in West Virginia were found to be more highly associated with the GIS-exposure measure than the tonnage measure, before and after statistical control for smoking rates. The superior performance of the GIS measure, based on where people in the state live relative to mining activity, suggests that activities of the industry contribute to cancer mortality. Further confirmation of observed phenomena is necessary with person-level studies, but the results add to the body of evidence that coal mining poses environmental risks to population health in West Virginia.

  15. A geographical information system-based analysis of cancer mortality and population exposure to coal mining activities in West Virginia, United States of America

    Directory of Open Access Journals (Sweden)

    Michael Hendryx

    2010-05-01

    Full Text Available Cancer incidence and mortality rates are high in West Virginia compared to the rest of the United States of America. Previous research has suggested that exposure to activities of the coal mining industry may contribute to elevated cancer mortality, although exposure measures have been limited. This study tests alternative specifications of exposure to mining activity to determine whether a measure based on location of mines, processing plants, coal slurry impoundments and underground slurry injection sites relative to population levels is superior to a previously-reported measure of exposure based on tons mined at the county level, in the prediction of age-adjusted cancer mortality rates. To this end, we utilize two geographical information system (GIS techniques – exploratory spatial data analysis and inverse distance mapping – to construct new statistical analyses. Total, respiratory and “other” age-adjusted cancer mortality rates in West Virginia were found to be more highly associated with the GIS-exposure measure than the tonnage measure, before and after statistical control for smoking rates. The superior performance of the GIS measure, based on where people in the state live relative to mining activity, suggests that activities of the industry contribute to cancer mortality. Further confirmation of observed phenomena is necessary with person-level studies, but the results add to the body of evidence that coal mining poses environmental risks to population health in West Virginia.

  16. MapReduce Functions to Analyze Sentiment Information from Social Big Data

    OpenAIRE

    Ilkyu Ha; Bonghyun Back; Byoungchul Ahn

    2015-01-01

    Opinion mining, which extracts meaningful opinion information from large amounts of social multimedia data, has recently arisen as a research area. In particular, opinion mining has been used to understand the true meaning and intent of social networking site users. It requires efficient techniques to collect a large amount of social multimedia data and extract meaningful information from them. Therefore, in this paper, we propose a method to extract sentiment information from various types o...

  17. Data Mining for Secure Software Engineering – Source Code Management Tool Case Study

    Directory of Open Access Journals (Sweden)

    A.V.Krishna Prasad,

    2010-07-01

    Full Text Available As Data Mining for Secure Software Engineering improves software productivity and quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks. However mining software engineering data poses several challenges, requiring various algorithms to effectively mine sequences, graphs and text from such data. Software engineering data includes code bases, execution traces, historical code changes,mailing lists and bug data bases. They contains a wealth of information about a projects-status, progress and evolution. Using well established data mining techniques, practitioners and researchers can explore the potential of this valuable data in order to better manage their projects and do produce higher-quality software systems that are delivered on time and with in budget. Data mining can be used in gathering and extracting latent security requirements, extracting algorithms and business rules from code, mining legacy applications for requirements and business rules for new projects etc. Mining algorithms for software engineering falls into four main categories: Frequent pattern mining – finding commonly occurring patterns; Pattern matching – finding data instances for given patterns; Clustering – grouping data into clusters and Classification – predicting labels of data based on already labeled data. In this paper, we will discuss the overview of strategies for data mining for secure software engineering, with the implementation of a case study of text mining for source code management tool.

  18. Analysis of biological processes and diseases using text mining approaches.

    Science.gov (United States)

    Krallinger, Martin; Leitner, Florian; Valencia, Alfonso

    2010-01-01

    A number of biomedical text mining systems have been developed to extract biologically relevant information directly from the literature, complementing bioinformatics methods in the analysis of experimentally generated data. We provide a short overview of the general characteristics of natural language data, existing biomedical literature databases, and lexical resources relevant in the context of biomedical text mining. A selected number of practically useful systems are introduced together with the type of user queries supported and the results they generate. The extraction of biological relationships, such as protein-protein interactions as well as metabolic and signaling pathways using information extraction systems, will be discussed through example cases of cancer-relevant proteins. Basic strategies for detecting associations of genes to diseases together with literature mining of mutations, SNPs, and epigenetic information (methylation) are described. We provide an overview of disease-centric and gene-centric literature mining methods for linking genes to phenotypic and genotypic aspects. Moreover, we discuss recent efforts for finding biomarkers through text mining and for gene list analysis and prioritization. Some relevant issues for implementing a customized biomedical text mining system will be pointed out. To demonstrate the usefulness of literature mining for the molecular oncology domain, we implemented two cancer-related applications. The first tool consists of a literature mining system for retrieving human mutations together with supporting articles. Specific gene mutations are linked to a set of predefined cancer types. The second application consists of a text categorization system supporting breast cancer-specific literature search and document-based breast cancer gene ranking. Future trends in text mining emphasize the importance of community efforts such as the BioCreative challenge for the development and integration of multiple systems into

  19. A Meta-information-Based Method for Rough Sets Rule Parallel Mining%基于元信息的粗糙集规则并行挖掘方法

    Institute of Scientific and Technical Information of China (English)

    苏健; 高济

    2003-01-01

    Rough sets is one important method of data mining. Data mining processes such a great quantity of data inlarge database that the speed of Rough Sets Data Mining Algorithm is critical to Data Mining System. Utilizing net-work computing resources is an effective approach to improve the performance of Data Mining System. This paperproposes the concept of meta-information,which is used to describes the result of Rough Sets Data Mining in informa-tion system,and a meta-information-based method for rule parallel mining. This method decomposes the information-system into a lot of sub-information-system,dispatchs the task of generating meta-information of sub-information-sys-tem to some task performer in the network,and lets them parallel compute meta-information,then synthesizes themeta-information of sub-information-system to the meta-information of information system in the task synthesizer,and finally produces the rule according to the meta-information.

  20. A mine of information: Benthic algal communities as biomonitors of metal contamination from abandoned tailings

    Energy Technology Data Exchange (ETDEWEB)

    Lavoie, Isabelle; Lavoie, Michel; Fortin, Claude, E-mail: fortincl@ete.inrs.ca

    2012-05-15

    Various biomonitoring approaches were tested in the field to assess the response of natural periphythic algal communities to chronic metal contamination downstream from an abandoned mine tailings site. The accumulation of cadmium (Cd), copper (Cu), lead (Pb) and zinc (Zn) as well as the production of phytochelatins, the presence of diatom taxa known to tolerate high metal concentrations, diatom diversity and the presence of teratologies were determined. We observed highly significant relationships between intracellular metal and calculated free metal ion concentrations. Such relationships are often observed in laboratory studies but have been rarely validated in field studies. These results suggest that the concentration of metal inside the field-collected periphyton, regardless of its species composition, is a good indicator of exposure and is an interesting proxy for bioavailable metal concentrations in natural waters. The presence of teratologies and metal-tolerant taxa at our contaminated sites provided a clear indication that diatom communities were responding to this metal stress. A multi-metric approach integrating various bioassessment methods could be used for the field monitoring of metal contamination and the quantification of its effects. Highlights: Black-Right-Pointing-Pointer Various approaches for metal contamination biomonitoring were used in the field. Black-Right-Pointing-Pointer Metal accumulation in periphyton is correlated to free ion concentration. Black-Right-Pointing-Pointer Teratologies and metal-tolerant taxa provided a clear indication of metal stress. Black-Right-Pointing-Pointer Stream periphyton shows great potential as a biomonitor of metal contamination.